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5 c ""iQI?. 

This invention relates to development of novel 
binding proteins by an iterative process of 
mutagenesis, expression, chromatographic selection, and 
amplification, 

10 

The amino acid sequence of a protein determines 
its three-dimensional (3D) structure, which in turn 
IS determines protein function!" ; "\..L.v,....A\I.m; . The 
system of classification of protein structure of Schuls 
and Schirmer ( semi 9 . ah 5) is adopted herein* 

The 3D structure of a protein is essentially 
'■m. unaffected by the identity of the amino acids at some 
loci; at other loci only one or a fm types of amino 

acid is allowed [«§J§1M« - Generally, 

loci where wide variety is allowed have the amino acid 
side group directed toward the solvent. While limited 
25 variety is allowed where the side group is directed 
toward other parte of the protein, (See also SCHrr?f., 
piS9~l?i and » p23S~24S, 314-315). 



(hello 

a protein is determined mostly by local 
Certain amino acids tend to he correlated 
with certain secondary structures and the commonly used 
ehou-Pasman (CH0tT?4 , CHO.qZM, ^HOUTSb) rules depend on 
these correlations. However, every amino acid type has 
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been observed in helices and in .both parallel and 
antiparailal sheets. Fej . des of identical 
sequence are found in different proteins? in sosae cases 
the conformations of th© pentapeptides are very 
5 3if fe rent (H . - 7 , 

Turns and loops tolerate insertions and deletions 
more readily than do other secondary structures 
(BICES!, TBORS&t SUTC8?a} .? related proteins differ most 
10 in loops and turns. 

Changing three residues in subtilisin from 
£ac\.;?.;g < ^ ' ^ - >. ~ - jn * as the 

corresponding residues in subtilisin from 

' * 'If org is produced a protease that had nearly the 
same activity as the suhtllisin from the latter 
organism? 82 differences remained in the sequences, 
The three residues changed were chosen because they 
were the only differences within 7 Angstroms {» of the 
20 active site (MMS1&) - 

Schuit and Sehirmer summarise many observations on 
the Binding of proteins to other molecules (SCHU79.,, 
p9.8~.105) . For example, haemoglobin alpha chains bind 

25 wry tightly to haemoglobin beta chains (delta Q more 
negative than -11.0 Kcal/mole) ? antibodies bind tightly 
to antigens (%s range from 10~ s . to 1Q~ W h, % is the 
dissociation constant equal to £&3 [Bj/(h:B] } ? basic 
bovine pancreatic trypsin Inhibitor (BPTI) binds 

3 0 tightly to trypsin (3% ** 6.0 x X0~~* H fTSCHSv) , delta 
S =* -18.0 Real/mole) ? and avidin binds to biotin (K d « 
1.3 x XG~~ S M (CREXM f p362)). In each case the 
binding results from complementarity of the surfaces 
that come into contact: bumps fit into holes, unlike 

35 charges come together, dipoiss align, and hydrophobic 
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atoms contact ether hydrophobic atoms- Although hulk 
water is excluded , individual water molecules are 
frequently found filling space* in intermoXecular 
interfaces? these waters usually form hydrogen bonds to 
5 one or sore atoms of the protein or to other bound 
water. 



The factors affecting protein binding are known, 
fCaOETS, CHO T76. SCHS79 , p9S-107, and SEE, Ch8 } ( but 

10 designing new complementary surfaces has proved 
difficult. Although some rules have been developed for 
substituting side groups (SUTCSIb) , the side groups of 
proteins are floppy and it is difficult to predict what 
conformation a new side group will take. Further, the 

IS forces that bind proteins to other molecules are all 
relatively weak and it is difficult to predict the 
effects of these forces. Hence, it is difficult to 
design superior binding proteins based on theory alone 
{Q0TOS7) , 

30 

SInsyme-suhstrate affinity, however, has 
fortuitously been increased by protein engineering 
<HI£EM> > A point mutant of tyrosyl t.RNA synthetase of 
Jl&cillus, §MMaa i3 S^ilm exhibits a 100-fold 

25 increase in affinity for ATP, Substitution of one 
amino acid for another at a surface locus may 
profoundly alter binding properties of the protein 
other than substrate binding, without affecting the 
tertiary structure of the protein. For example, in 

30 sickle-cell haemoglobin the change of the surface 
residue E6 to v in the beta chains causes 
dso^yhaemogiobin™S to form fibers through self binding 
33,, PX25-I45 ' the tertiary and quaternary 
structure, of the haemoglobin are not changed {PADL85 ,„ 

35 
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Changing a single ssino acid i& BPTI greatly 
reduces its .binding to trypsin,- but some of the new 
molecules r stain the. parental characteristics of 

exhibit new binding to elastase ( tint-:?? tsch>37.) < 
Changes of single amino acids on the surface of tbe 
lambda Cxo repress ar greatly reduce its affinity for 
the natural operator 0r3, out greatly increase the 
10 binding of the mutant protein to a mutant operator 
fB.IS.B8S) > Thus changing the surface of a binding 
protein may alter its specificity without abolishing 
binding act ivi ty . 

IS Th& recently developed techniques ox "reverse 

genetics ** have been used to produce single specific 
mutations at precise base pair loci {CLIP- 
and , &t?SU87^ » Mutations are generally detected hy 
seg^ancing and in &<xm c&sm by lose of wild-type 

«20 function, These procedures allow researchers to 
analyse the function of each residue in a protein 
fHILLSS) or of each bass pair in a regulatory DN& 
sequence {q3%NS8j . in these analyses ,< the norm has 
been to strive for the classical goal of obtaining 

25 mutants carrying a single alteration (acsustk 

Reverse genetics is often applied to ceding 
regions to determine which residues are most important 
to protein structure and function; isolation of a 
3D single mutant at each residue of the protein gives an 
initial estimate of which residues play crucial roles. 

Prior to the method of the present invention, two 
general approaches have been developed, to create novel. 
35 mutant proteins through reverse genetics. In one 
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approach, dubbed !i protain surgery* {.DILLS?) > a specific 
substitution is introduced at a single protein residua 
to determine the affects on structure and function of 

~ < 'iasiraolo orate i > u . s £ rs - ire 

multiple- aadno acid substitutions and thus are not 
accessible through single base changes or even through 
all possible amino acid substitutions at any on® 
residue. 

The other approach has been randomly to generate a 
variety of am t ants at many loci within a cloned gene 
using mutagenic chemicals or radiation. The specific 
location and nature of the change are determined by DNA 
sequencing. CFfflM) This approach is limited by the 
imaber of colonies that can be examined. Also, it does 
not take advantage of any knowledge of the protein 
structure and its relationship to binding activity. 

Progress toward rules governing substitutions of 
amino acids (TJL8E83) has been greatly hampered by the 
extensive efforts involved in using either method and 
the practical limitations on the number of colonies 
that can be inspected (HOSES 6} . 

The term "saturation mutagenesis" with reference 
to synthetic DNA is generally taken to mean generation 
of a population in which; a) every possible single-base 
change within a fragment of a gene of DMA regular cry 
region is represented^ and b) most mutant genes contain 
only one mutation. Thus a set of all possible single 
mutations for a 6 bass pair length of DNA comprises a 
population of .18 mutants. Oliphant ep aj^ (OLIPS6) and 
Oliphant and Strati <OLIP0?) have demonstrated, ligation 
and cloning of highly degenerate oligonucleotides and 
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have applied saturation mutagenesis to the study of 
promoter sequence and function. They suggest that 
similar methods could be used to study genetic 
expression of proteins, but they do not say how tot a) 
5 choose protein residues to vary, or b) select or screen 
mutants with desirable properties, 

Raidhaar-Olson and Sauer (RSID88) have used 
synthetic degenerate oligo~nts to vary simultaneously 

10 two or three residues through all twenty amino acids in 
the dlmer interface of cX repressor fraa bacteriophage 
lambda, They give no discussion of the limits on how 
many residues could fee varied at once nor do they 
mention the problem of unequal abundance of mh 

15 encoding different amino acids. They looked for 
proteins that either had wild-type dimerisatlon or that 
did not dimerisse, 'they did not seek proteins having 
novel binding properties end did not report any, 

20 Several researchers have designed and synthesized 

proteins £& novo . These designed proteins are small 
and most have been synthesized in vitro as polypeptides 
rather than genetically < sutte and colleagues have 
ma.de a polypeptide that hinds DDT in 53% athanol 

2 5 (KOSE83) . Recently Koser g& aJU (K0SBS7) reported 
genetic expression in |h. coll both of the designed 24 
residue DDT-binding protein and of fusions of the BDT~- 
fo lading sequence to Laol, They state that design of 
biologically active proteins la currently impossible . 

30 

Sricbson et al... (SRICSS) have designed and 
synthesized a series of proteins that they have named 
betabeilins, that are meant to have beta sheets. They 
suggest use of polypeptide synthesis with mixed 
35 reagents to produce several hundred analogous 
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betafeeliins, and use of a column to recover analogues 
with high affinity for a chosen target compound bound 
to the column. They envision successive rounds of 
mixed synthesis of variant proteins and purification by 
5 specific binding. fhey do not discuss how residues 
should toe chosen for variation, Because proteins 
cannot be amplified, the researchers must sequence the 
recovered protein to learn which substitutions improve 
binding, The researchers must limit the level of 
X0 diversity so that each variety of protein will fee 
present in sufficient quantity for the isolated 
fraction to be sequenced. 

Methods have been developed to separata ceils 

15 through their affinity to various substances. Methods 
applied to animal cells reveal common problems: a) non- 
specific interactions between cells and affinity 
supports, and to) irreversible binding of cells to 
af f inity matrices { BONN'S 5} < 

20 1 

Fere-no i and collaborators have publishee: a series 
of papers on the chromatographic isolation of mutants 
of the maltose- transport protein LamB of S* jSS&I 
(W&MD79 i FERESOa, FSBSSOb, FEREBOe, FERE82a, PERES 2b, 

25 PERES 3 , CL0N84, FERES 6a , FSR186to f FEkES6c ? FERES 7 a , 
f$mWb, HEXH87, and EIXH88) . The papers report that 
spontaneous and induced mutants at the ia&I. genetic 
locus can be isolated by chromatography over a column 
supporting immobilized maltose, maitodextrine, or 

30 starch. The reports speculate that other applications 
are possible, but specifically mention only the 
elucidation of the residues responsible for the 
selectivity of the maltodextrin pore or similar pore 
proteins. The mutant proteins were oon-ehimeric , and 

3 5 no attempt was made to obtain binding to a new target. 
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Bo th FERES 6a and CLOTS 4 point up the 
difficulties of wor&iag with live bacteria that can 
metaixslise chemicals and change their physiological 
5 havi ri e chr phi a ex 

A fragment of a heterologous getie can foe. 
introduced into Bacteriophage PI gene III (SMITES) . If 
the inserted gene preserves the original reading frame r 

10 expression of the altered gene III causes an inserted 
domain to appear in the gene III protein. The 
resulting strain of tl. virions are adsorbed by an 
antibody against the protein encoded by the 
heterologous DMA. The phage were elnted at pH 2,2 and 

15 retained some infect ivity. However, the single copy •©.£ 
fl gene Hi was used for insertion of the heterologous 
gene so that all copies of gene III protein were 
affected; infectivity of the resultant phage was 
reduced 2 3 -fold. 

20 

Smith presented bis method as a way to isolate 
cloned genes using antibodies to the gene products. He 
made no mention of mutagsnising the inserted genetic 
material or of inducing novel binding properties in the 
25 inserted protein domain. 

A fragment of the repeat region of the 
circussporojsoits protein from Plasmodium falciparum has 
been expressed on the surface of H13 as an insert in 
30 the gene III protein (CROSS 8 ) , The recombinant phage 
were both antigenic and immunogenic in rabbits. The 
authors do not suggest mutagenesis of the inserted 
material . 
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Gene fragments coding for hepatitis B virus 
antigens have been fused to fragments of XsmB, and if 
the- fusion la in a region coding for exposed domains of 
LamB, the HBV antigens appear on the ceil surface and 
5 are immunogenic (CHARS 7} , Charhit et ad,.,. ( CHARS? ) 
suggest use of these engineered strains for development 
of a live bacterial vaccine? they did not suggest 
mutagenesis of the fused heterologous gens fragments, 
nor development of binding capabilities. 

10 

Ladner, US Patent Ho, 4, 704 , 692,. "Computer Based 
System? and Method for Determining and Displaying 
Possible Chemical structures for converting Double- or 
Multiple-Chain Polypeptides to Single-Chain 

IS Polypeptides" describes a design method for converting 
proteins composed of two or more chains into proteins 
of fewer polypeptide chains, .but with essentially the 
same. 3D structure. There is no mention of variegated 
W& and no genetic selection* Leaner and Bird, 

20 WO8S/0164S (Pub!* March 10, 1988} disclose the specific 
application of computerised design of linker peptides 
to the preparation of single chain antibodies. 

Ladner, SlicK and Bird, mm/ 06630 (publ. 7 Sept., 
23 1988} (LGB) speculate that diverse single chain 
antibody domains may be screened for binding to a 
particular antigen by varying the DHA encoding the 
combining deta.rmi.eing regions of a single chain 
antibody, subcloning the SCAD gene into the gpv gene of 
30 phage lambda so that a SCAS/gpV chimera is displayed on 
the outer surface of the phage, and selecting phage 
which bind to the antigen - through affinity 
chromatography. The only antigen mentioned is bovine 
growth hormone. Ho other binding molecules, targets, 
3 5 carrier organisms, or outer surface proteins are 
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discussed. Nor is there any mention of the method or 
degree of mutagenesis \ 

Ladner and Bird, WO88/06601 (publ. 7 September 
1588} suggest that single chain "psuedod later i c " 
repressors (DHA~binding proteins) it-ay be prepared by 

mutating a putative, linker peptide followed by .it yivg 

selection that mutation and selection may toe used to 
create a dictionary of recognition elements for txs« in 
the design of asymmetric repressors. The repressors 
are not displayed on the outer surface of an organise. 

No admission is made that any cited reference is 
prior art or pertinent prior art, and the dates given 
IS are those appearing on the reference and may not fee 
identical to the actual publication date. 

SUMMARY OF THE IHVEHTXOH 

This invention relates to the construction t 
expression, and selection of nutated genes that, specify 
novel proteins with desirable binding properties, as 
well as these proteins themselves. The substances 
bound by these proteins, hereinafter referred to as 
"targets", amy he, but need not be, proteins. Targets 
may include other biological or synthetic 
macrome locales as well as organic and inorganic 
molecules . 

30 The novel binding proteins may be obtained: 1} by 

nutating a gene encoding a known binding protein within 
the subsequence encoding a known binding domain., cr 2) 
by taking such a subsequence of the gene for a first, 
protein and combining it -with, all or part of a gene for 

35 a second protein (which nay or nay not he itself a 



10 



20 



25 



fcnovn bis ling tals y mutating s gene encoding 

a protein which, not possessing a known binding 

activity, possesses a secondary cr higher structure 
that lends itself to binding activity (clefts, grooves, 
etc<J , or 4) by mutating a gene encoding a known- 
binding protein but not in the subsequence known to 
cause the binding. The protein Iron which the novel 
binding protein is derived need not have any specific 
affinity for the target material * 

In one embodiment, fche invention relates no; 

a) preparing a variegated population of replicable 
genetic packages, each package including a nucleic 
acid construct coding on oppression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface, of the 
package and (ii) a potential binding domain for 
bidding said target, where a plurality of. 
different potential binding domains are displayed 
by the individual packages, 

b) causing the expression of said protein and the 
display of said protein on the outer surface of 
such packages, 

c) contacting the packages with target material so 
that the potential binding domains of the proteins 
and the target siateriai nay interact, and 
separating | erkages bearing i potential binding 
domain that succeeds in binding the barest 
materia 1 Iron packages that do not so bind. 
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d) recovering and replicating at least one package 
hearing a successful binding domain , 

ie) determining the amino acid sequence of the 
successful binding domain of a genetic package 
which bound to the target material,- 

{£) preparing a new variegated population or 
replicable genetic packages according to seep (a) , 
the parental potential binding doxaain tax the 
potential binding domains of said new packages 
being a successful binding domain whose sequence 
was determined in step (e) ,< and repeating steps 
(b)-(e) with said new population, and, when a 
package bearing a binding domain of desired 
binding characteristics is obtained, 

(g) abstracting the gene encoding the desired 
binding domain from the genetic package and 
placing it .into a suitable expression system, 

(The binding domain may then be expressed as a 
unitary protein, or as a domain of a larger 
protein) - 

The invention further relates to a method of 
preparing a mixed population of replicable genetic 
packages in which each package includes a gene 
expressing a potential binding protein in such a manner 
that the protein is presented on the outer surface of 
the package. This method comprises: 

i) preparing a variegated population of DHA 
inserts bf each of which comprises a first 
sequence which codes on expression for a potential 
binding domain and, a second sequence encoding 
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signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replieabis genetic package, and 

S ii) incorporating the resulting population of DNA 

constructs into the sfeosen replieabis genetic 
packages to prodstee a population of rapiieable 
genetic packages » 

10 tn a preferred embodiment , the potential -binding- 

protein-encoding inserts are incorporated into a gene 
encoding an outer-surface protein of the repiicabie 
genetic package « 

IS The invention encompasses the design and oynchesis 

of variegated Dm encoding a family of potential 
binding proteins characterised fey constant and variable 
regions, said proteins being designed with a view 
toward obtaining a protein that hinds a predetermined 

20 target. 

For the purposes of this invention,, the term 
''potential binding protein" refers to a protein encoded 
by one species of m& molecule in a population of 
25 variegated 0M& wherein the region of variation appears 
in one or more subsequences encoding one or ssore 
■segments of the polypeptide having the potential of 
serving as a binding domain for the target substance . 

30 Press tine to tine, it stay be helpful to speak of 

the "parent sequence" of the variegated MA. When the 
novel binding domain sought is an analogue of a known 
binding domain, the parent sequence is the sequence 
that encoues the known bindi.no domain- The variegated 

35 DNA will be identical with this parent sequence at most 
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loci, but will diverge from it at chosen loci* Whan a 
potential binding domain is designed from first 
principles, the parent sequence is a sequence which 
encodes the e~d.no acid sequence that has been predicted 
3 to torn the desired binding doxsain, and the variegated 
m& is a population of "daughter DH&s ss that are related 
to that parent by a high degree of sequence similarity. 

The fundamental principle of the invention is one 
10 of forced evolution * The efficiency of the forced 
evolution is greatly enhanced by careful choice of 
which residues are to be varied. The 3D structure of 
the potential binding domain is a key determinant in 
this choice. First a set of residues that can 
IS simultaneously contact one molecule of the target is 
identified. Then all or soma of the codons encoding 
these residues are varied simultaneously to produce a 
variegated population of PH2U The variegated, 
population of XMh is used to transform cells so that a 
30 variegated population of genetic packages is produced. 

The nixed population of genetic packages 
containing genes encoding possible binding proteins is 
enriched for packages containing genes that express 

as proteins that in fact bind to the target (" successful 
binding domains 5 *} « After one or more rounds of such 
enrichment, one or more of the chosen genes are 
examined and sequenced. If desired, new loci of 
variation are chosen. The selected daughter genes of 

30 one generation then become the parent sequences for the 
next generation of variegated TMh, beginning the next 
* variegation cycle.*' Such cycles are continued until a 
protein with the desired target affinity is obtained. 
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The appended claims are hereby incorporated by 
reference into this specification as an enumeration of 
the preferred embodiments. 

S BRIEF DESCRXPTIOH OF TEE DRAWXMGS 

Figure 1 is a schematic showing the relationships 
between various types of Binding Domains (BD) . 

10 Figure 2 is a flow chart showing the major stops used 
to create a novel protein with affinity for a pre- 
determined target ♦ 

Figure 3 is a schematic of a FBD contacting a molecule 
15 of target material* 

figure 4 is a schematic of the construction of pLG3 
from M13mplS and pBS*322. 

ao Figure S is a schematic of the construction of phQ7 
from pLG3 and synthetic DMA. 

»raXO> DSSCKIFriOH OF TEE 

as. p,i; ,,mm.imi 

The present invention separates mutated genes that 
specify novel proteins with desirable binding 
properties from closely related genes that specify 

30 proteins with no or undesirable binding properties, by: 
1} arranging that the product of each mutated gene be 
displayed on the outer surface of a replieabie genetic 
package that contains the gene, and 2'} using affinity 
separation incorporating a desirable target material to 

35 enrich the population of packages tor those packages 
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containing genes specifying proteins with improved 
c.rc.r.cs fee that ta er„al. 

Lest. (3C,-y) be a dissociation constant, 

3 

m [y] 

K D (r f y) ~ . 

c^yj 

10 For the purposes of the appended claims, a protein 

p is a binding protein if 



C X) for one molecular, ionic or atomic species A, 
the dissociation constant K D (P,A) 
15 < ICT 6 moles/ liter, and 



(2) for a different molecular, ionic or 
atomic species B, K D (P f B) > 10" 1 
moles/ liter. 

As a result of these two conditions, the protein P 
exhibits specificity for A over B,< and a ainismm degree 
of affinity (or avidity) for A, 

When a domain of a protein is primarily 
responsible for the protein' a ability to specifically 
bind a chosen target, it is referred to herein as a 
"binding domain" (SB) > We engineer the appearance of a 
stable protein domain denoted as an u initial potential 
binding domain 5 ' (IPBD) r on the surface of a genetic 
package. <£he present invention is concerned with the 
expression of numerous, diverse, variant "potential 
binding domains" (PBD) , all related to a "parental 
potential binding domain** (PFBO) such as the binding 
domain of a 3tSown binding protein, and with selection 
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and amplification of the genes encoding the most 
successful mutant PBDs . An 1FBD is chosen as PPBD to 
the first round of variegation. Selection-through- 
binding isolates on© or more "successful binding 
5 domains" (SBD) . An S.BD from one round of variegation 
and selsction~through~foindir:g is chosen to he the ?P3D 
for the next round. The invention is not, however, 
limited to proteins with a single BD since the method 
may be applied to any or ail of the BBS of the protein, 
10 sequentially or simultaneously. The relationships of 
the various BDs are illustrated in Figure i. 

The term "variegated refers to a population 

of molecules that have the same base sequence through 

IS most of their length, hut that vary at a limited number 
of defined loci, preferably S~1Q codons. A molecule of 
variegated DHA can he introduced into a plasmid so that 
it constitutes part of a gene (OBJPSS, OLXP8?, MJSt?87, 
- RSXD88) » When piasmids containing variegated DMA are 

20 used to transform bacteria, each call makes a version 
of the original protein. Each colony of bacteria may 
produce a different version from any other colony- if 
the variegations of the TMA are concentrated at loci 
•known, to be on the surface of the protein or in a loop, 

25 a population of proteins will foe generated, many 
members of which will fold into roughly the same 3D 
•structure as the parent protein. The specific binding 
properties of each member, however, may he different 
from each other member. It remains to sort out the 

30 colonies containing genes for proteins with desirable 
binding proper-ties from those that do not exhibit the 
desired affinities, 

A « single-chain antibody" is a single chain 
35 sing at least s sa - 
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amino acids forming two antigen-binding regions 
connected by a. peptide linker that allows the two 
regions to told together to bind the antigen. Either 
the two antigen~toip3ing regions isnst be variable 
5 domains of knoro antibodies r or they oast (1) each told 
into a beta barrel of nine strand© that are spatially 
related in the same way as are the nine strands of 
known antibody variable light or heavy domains, and (2) 
fit together in the same way as do the variable domains 
10 of said known antibody, generally speaking, this will 
require that, with, the exception of the amino acids 
corresponding to the hypervariabia region , there is at 
leant 88% homology with the amino acids of the variable 
domain of a known antibody. 

ZS 

The term "affinity separation means" includes, hot 
is not limited to? a) affinity column chromatography, 
b) batch elntion .from an affinity matrix material t . o'$ 
hatch elation from an affinity material attached to a 

20 Plate, d) fluorescence activated ceil sorting, and e) 
electrophoresis in the presence of target material. 
"Affinity material" is used to mean a material with 
affinity for the material to be purified, called the 
"analyte* , Xn most cases, the association of the 

25 affinity material and th i sible so that 

the amalyta can be freed from the affinity material, 
once the impurities are washed away. 

Affinity column chromatography, batch elation from 
30 an affinity matrix material held in some container, and 
batch elation from a plate are very similar and 
hereinafter will be treated under "affinity 
chromatography, n 
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Fluorescent-activated call sorting involves use of 
an affinity material is fluorescent £g£ m or is 

labeled with a fluorescent molecule. Current 
commercially available ceil sorters require 800 to 1000 
3 molecules of fluorescent dye, such as Texas red, bound, 
to each cell. FACS can sort 10 3 cslis or viruses /sec, 



Electrophoretic affinity separation involves 
electrophoresis of viruses or cells in the presence of 
10 target material, wherein the binding of said target 
material, changes the net charge of the virus particles 
or cells. It has been used to separate bacteriophages 
on the basis of charge, (SSRW87) , 

15 The present invention makes use of affinity 

separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population for 
those cells or viruses carrying genes that code for 
proteins with desirable binding properties, 

20 

Xn the present invention, the words M select " and 
!t selection 8 * are used exclusively in the genetic sense? 
i.e. a biological process whereby a pnenotypic 
characteristic is used to enrich a population for those 
25 organisms displaying the desired phenotype. 

The process of the present invention comprises 
three major parts r 

30 I, design and production of a rspiicable 

genetic package (GP) that displays an XPBD on 
the surface of the GP, denoted GP(XPBD) , 



XX, design and implementation of an affinity 
33 separation process that separates GP(IPBD)s 
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that bind to a. known affinity molecule from 
wild-type <3Ps or GP (IPBD") s , neither of which 
binds she known affinity molecule, and 

S 1X1 . design and implementation ox a genetic 

variegation method , denoted structure- 
directed mutagenesis, wherein a population of 
10 6 or more different GP(PBD) s, denoted 
SP(vgPBD), is produced* 

10 

One affinity separation is called a "separation cycle" t 
one pass of variegation followed by as many separation 
cycles as are needed to isolate an SBO, is called a 
"variegation cycle" « 'She amino acid sequence of one 
IS SBD from one round becomes the PPBD to the nart 
variegation cycle. We perform variegation cycles 
iteratively until the desired affinity and specificity 
of binding between an SBD and chosen target are 
achieved. 

30 

Part I is a strain construction in which we deal 
With a single XPBD sequence. Variability may .be 
introduced into DMA subsequences adjacent to the 4f;M 
subsequence and within the offy^ipb-d , gens so that the 

25 XPBD will appear on the OP surface, A molecule,, such 
as an antibody, having high affinity for correctly 
folded XPBD is used to: a) detect XPBD on the GP 
surface f h) screen colonies for display of XPBD on the 
SP star face, or c) select G5»s that display XPBD from a 

30 population, some members of which might display IPBO on 
the GP surface. In one preferred embodiment, Part I of 
the process involves; 

1) choosing: a GP such as a bacterial cell (Sec. 
35 i-i.l),. bacterial, spore (1.2,1), or phage (1.3.1) , 
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3D 



having a suitable outer surface protein (Sees. 
1.1-3, 1.2.3, ana 1.3.3} , 

3) choosing a stable JPBD (Sec. 3), 

3) designing an amis© acid sequence that; a) 
includes the 1PBD as a subsequence and b) will 
cause the IPBD to appear on the GP surface (Sees. 
1.1,2, 1,2.2, 1.3,2, and 4} t 

4) engineering a gene, denoted gs.p~.lp bd, that; a) 
codes for the designed aniiso acid sequence, b} 
provides the necessary genetic regulation, and c) 
introduces convenient sites tor genetic 

( sees* 4 . 1 1 4.2 , 4 . 3 1 . S * 1 , and 5.2} , 



t s - IP (Sec 

6.1} , and 

6} harvesting the transformed GPs (See* ?) and 
testing fehesi for presence of IPBD on the GP 
surface (Sec. 8} ? this test is performed with an 
affinity molecule having high affinity for IPBD, 
deno ted MM { IPBD) ,. 

In another preferred embodiment,. Part X of the process 
involves; 

X) and 2} as above 

3) designing a DNA sequence that: a} encodes the 
IPBD as a sun eq - c© and h) contains suitable 
restriction sites so that random DMA may be 
oparably linked to the ipbd gene fragment? and c) 
provides the necessary genetic regulations; this 
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DMA sequence is called a "display probe*, (Sees. 
1,1.4, 1.2. 4,- 1.3-4 and 4} f 

4) constructing that display probe, 

S 

5) cloning the display probe into and amplifying 
it in a suitable host into the 0C? f 

6} cloning random or pseudorandom DH& into one of 
10 the restriction- sites provided in the display 

probe, (Sec. 6.2) f whereby the random or 
pseudorandom DH& functions as a potential and 

7) harvesting GPs (Sec 7} screening colonies of 
15 the transformed GPs for presence of XPSD on the GP 

surface? this screening is performed with an 
affinity molecule having high affinity for XPSP, 
denoted AfM(TPSD) f (Sec. 8} ; or, alternatively? 

20 8} selecting G3M that display XP8D by use of an 

affinity separation using MM(XPBD) , (See, 8} . 

Once a GP(XPBO) is produced, it can be used many 
times as the starting point for developing different 
25 novel proteins 'that bind to a variety of different 
targets. The knowledge of how we engineer the 
appearance of one XPBD on the surface of a OP can be 
used to design and produce other GP(IPBD)s that display 
different XFBBs, 

30 

Although. Part X deals with only a single XPBD, 
many preparations are made for Part 111 where we 
introduce numerous carat ions into the potential binding 
domain. References to BBD or pfcd in Part I are to 
3 5 indicate a preparatory intent. 



wo 90/02x09 



PCT/IJS89/0373I 



In Part IX we optimise separation of GP(IPSD) from 
wild-type OP, denoted wtGP, based on the affinity of 
XPBD for AiH ( IPBD) and establish the sensitivity of the 
affinity separation process. In a preferred 
embodiment.. Part II of the process of the present 
invention involves : 

X) preparing affinity columns bearing AfM(IFBD) at 
various densities of A£M(IPBB}/ (volume of matrix) , 
(Sen. 10.1} , 

2) preparing GP(IPBB)s with various amounts of 
IPBD per Op, 

3) picking a gradient regime for elating the 
columns (Sec. 10*1), 

4) determining which coiafoiaation ofs a) IPSD/GP, 
b) density of AfM(XP8B}/ {volume of support) ,. c) 
initial ionic strength, d) sintion rate, and e> 
(amount of OP) /{volume of support) loaded f gives 
the best separation of OP (IPBD) from wtGP (Sac. 

XO. 1) , 

3) determining the smallest amount of OP (IPBD) 
that can be isolated from a much larger amount of 
wtOP using the optimal condition,- (Sec, 10*2) , and 

6) determining the efficiency of the affinity 
separation procedure (Sec. 10.3). 

Part II optimizes separation of a single type of 
OP (XPBD) from a large excess of a single different GP» 
She ontimur conditions will be used in Part III to 
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separate GP£PBD}s that bind the target from GP(PBD)s 
that do not bind the target, The optimisation will be 
at one or more specific temperatures and at one or mora 
specific pHs. In Part III,, the user must specify the 
5 conditions under which the selected SED should bind the 
tax-gat. If the conditions of intended use differ 
markedly from the conditions for which affinity 
separation *?as optimised, the user must return to Part 
XI and optimise the affinity separation for conditions 
10 similar to the conditions of intended use of the 
selected SBD. 

In Part III, sre choose a target eaterial and a 
GP(IPBD) that was developed by the method of Part I and 

15 that is suitable to the target material. Using XPBD as 
the PPBD to t&« first cycle of variegation, we prepare 
a wide variety of onsrsM genes that encode a wide 
variety of PBDs. We use an affinity separation,, 
developed by the method of Fart II, to enrich the 

20 population of GP{vgPBD)s for SPs that display PBDs with 
binding properties relative to the target that are 
superior to the binding properties of the PPBO, An sbd 
selected from one variegation cycle becomes the PPBD to 
the next variegation cycle. In a preferred embodiment, 

2S Part III of the process of the present invention 
involves: 

1} picking a target molecule (Sec. 11} , 

30 3) picking a GP(XPBB) (Sec. 12} , 

3| picking a set of several residues in the PPBD' 
to vary based on a} the 3D structure of the XPBB ? 
fo) seg^enees of homologous proteins, and c) 
35 computer or theoretical modeling that indicates 
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which residues can tolerate different assino acids 
without disrupting the underlying structure (S«c, 



4) picking a subset of the residues to be varied 
simultaneously based on the number of different 
variants and which variants are within the 
detection capabilities of the affinity separation; 
(Sec, 13, 2) ? 

5) implementing the variegation hfi 

a) synthesizing the part of the osp.™pM gene 
that encodes the residues to be varied using a 
specific fixture of nucleotide substrates for 
some or all of the basee encoding residues 
slated for variation, thereby creating a 
population of D!fi molecules, denoted vgDMA 
(Sec. 13.3), 

b) Heating this vgORA, by standard methods, 
into the operative cloning vector (OCV) (i^ 
a -piasBBid- or bacteriophage) (Sec, 14. ij , 

c) using the ligated tMh to transform ceils, 
thereby producing a population of transformed 
cells (Sec. 14.2), 

d) culturing increasing in number) the 
population of transformed cells and harvesting 
the population of gf{PBD)s, said population 
being denoted as GP (tgPBD) , (Sec. 14.3), 



35 



e) enriching the population for GPs that bind 
the target by using the affinity separation 
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.process developed in Bart XI, with, the chosen 
target «ol*cule as affinity molecule (Sec. IS}., 

f) repeating st&ps XIX. S.d and XXI. 5. e until a 
5 GP(SSD) having improved binding to the target 

is isolated (Sec .15) , and 

g) testing the isolated SBD or SBDs for 
affinity and specificity for the chosen target 

10 (Bee. 15.8), 

6} repeating steps XXX, 3, XXX, 4, and XII. S until 
the desired degree of binding is obtained. 

15 Part III is repeated for each new target material. 

Part I need be repeated only if no GP(XPBD) suitable to 
e chosen target is available, Fart XI need be repeated 
for each newly-devsloped GP(XPBD) and for previously- 
developed GP(XPBD}s if the Intended conditions of use 

30 of a* novel binding protein differ significantly from 
the conditions of previous optimisations. 

25 Xhe following abbreviations will toe used 

throughout the present invention; 

Abto^eygstign Meaning 

30 GP Genetic Package, e.g. a 

h act er iophage 

• X Any protein 



3 5 X 



3?he gene for protein X 
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Initial Potential Binding 
Domain f .i.g> BFTX 

Potential Binding Domain, &sJ3U 
a derivative, of BPTX 

Successful Binding Domain, 
e.g. a derivative of BPTX 
selected for binding to a 
target 

Parental Potential Binding 
Domain, 1^ an XPBD or an SBD 
from a previous selection 

Outer Surface Prot&in, e ; .a f „ 
coat protein of a phage of 
LshsB from 1^ coll 

Fusion of an CSP and a PBD, 
order of fusion not specified 

Outer surface Transport Signal 

A genetic package containing 
the x gene 

A genetic package that 
displays X on its outer 
surface 

A.n affinity matrix supporting 
S! ; ( e ;T4 > ysosyme ; is T4 
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lysosyme attainted to an 
affinity matrix 

AfM(W) A molecule having affinity for 

«W« , e.g. trypsin is an 
A£M{BPTI) 



A cnemical that can. indues 
expression of a gene,. e.,.g.«. 
XPf G for th s ? nsoter 



Operative cloning vector 

% - [T][SBD]/[T;SBD3 (* is a 
target) 

% « [H1[SBDj/[K5SBD] (His a 

t) 



Density of ArH(W) on affinity 



Outer membrane protein 
nucleotide 



A Mmolecuiar dissociation 
constant, % - WCB]/t&;B] 



S err Error level in synthesizing 

vgDNA 
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lMa......g...3,; aj^g^ai^quencin q aaj&g&i. 

The present invention is not .limited to a single 
5 method ox determining the sequence of nucleotides (nts) 
in WA subsequences. Sequencing reactions, agarose gel. 
electrophoresis,, and poly aery lam ide gel electrophoresis 
(PAG!) are performed by standard procedures . 

1.0 Uhe present invention is net United to a single 

method of determining protein sequences, and reference 
in the appended claims to determining the amino acid 
sequence of a domain is intended to include any 
practical method or combination of methods, whether 

15 direct or indirect. The preferred method* in most 
cases,, is to determine the sequence of the DNS. that 
encodes the protein and then to infer the amino acid 
sequence. In some cases, standard methods of protein- 
sequence determination may be needed to detect poet- 

20 translstional processing. 

*** — »«. 

The major steps in the process of making and 
25. isolating a novel binding protein with affinity for a 
chosen target material are illustrated, in Figure 2, 

sec i sr r > 

pisplaying.._.a Heterologous Mndjjsg^D^^ 

3 0 Surface; 

Sec > 1 . 0 : , Genera .1 Beguiremert r >r G* ' iee 

It is emphasised that the GP on which se.ieeti.on-~ 
35 through-binding will be practiced must be capable, 
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after the selection,, either of -growth in soma suitable 
environment or of in v itro amplification and recovery 
of the encapsulated genetic message* During at least 
part of the gxo«ffcfe, ttoe. increase in number must, foe 
5 app axp< x e * 1 ^ 5<?i \ psct t time. ^foe 

component of a population that exhibits the desired 
binding properties .may be guite small , for example, ens 
in 10 6 or less. Once this .component of the population 
is separated from the non-binding components,, it must 
10 he possible to amplify it* Cul taring viable cells is 
•the most powerful amplification of genetic material 
known and is preferred. Genetic messages can also foe 
amplified in vitro , but this is not preferred* 

IS A sp may typically be a vegetative bacterial Sell,, 

a bacterial spore or a bacterial m& virus* A strain 
of any living ceil or virus is potentially useful if 
the strain can bet 

20 1) maintained in culture, 

Z}. affinity separated and retain its viability,. 

3) genetically altered with reasonable facility,, 
25 and 

4) manipulated to display the potential binding 
protein domain where it can interact with the 
target material during affinity separation* 

30 

OKA encoding the XPBD sequence may foe operafoly 
linked to DNA encoding at least the outer surface 
transport signal of an outer surface protein (OSP) 
native to the & so that the XPBD is displayed on the 
35 outer surface of the GP* It should foe possible to 
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cause a genetic paohage to display the XPBB or PBD on 
its out ear surface without adversely affecting the 
viability of the OP or the binding characteristics of 
the IS»BD or P8B, if the fusion is rear domain 
5 boundaries { BS.CK8.3 CFAW87, TOTH8S, SKITS 5 , MAHO86? and 
cf . RGSS81, HOLLS 3 } < 



Those characteristics of a protein that are 
10 recognized by a cell and that cause it to be 
transported out of the cytoplasm and display®** on the 
ceil surface will he termed "outer-surf ace transport 
signals" . 

15 The replicabie genetic entity (phage or plasmid) 

that carries the os p-ofod genes (derived from the ojsgr 
iobd gene) through the selection-through-MMing 
process, see Sec. 14, is referred to hereinafter as the 
operative cloning vector <OCV) ♦ When the Qcv is a 

20 phage, it may also serve as the genetic package. Th® 
choice of a GP is dependent in part on the availability 
of a suitable OCV and suitable OSP. 

Preferably, the GP is readily stored, for example, 
25 by freezing. If the OP is a cell, it should have a 
short doubling time, such as 20-40 minutes. If the OF 
is a virus, it should be prolific, e.g., a burst size 
of at least 100/ infected ceil. SPs which are finicky 
or expensive to culture are disfavored. The OF should 
30 be easy to harvest, preferably by eantr if uga thorn The 
OP is preferably stable for a temperature range of -70 
to 42°C (stable at 4°C for several days or weeks) } 
resistant to shear forces found in H?LC; insensitive to 
UV? tolerant of desiccation; and resistant to a pH of 
35 2.0 to 10,0, surface active agents such as SDS or 
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frifcosr, ehaotropes as 4M urea or 2M guanidintum 

HCX, common ions such as I 4 ", Ma + ' ( and S0 4 . , common 
organic solvents such as ether and acetone, and 
degradstive enraes. Finally , there must be a suifabl© 
S 00? (sae Sec, 3) * 

Preferably, the 3 D structure of the OS?, and the 
sequence of the OSP geae. p. 47 are Josown. If the 3D 
structure is not )movn> there is preferably Knowledge 

10. of which residues are exposed on the. cell surface, the 
location of the domain boundaries within the OSS, 
and/or of successful, fusions of the OSP and a foreign 
insert. Th& OSP preferably appears in numerous copies 
on the outer surface of the GP, and preferably serves a 

15 non-essential function * It is desirable that the OSP 
not be post translationaily processed,, or at least that 

The preferred 6P> OCV and OSP are those for which 
28 the fewest serious obstacles can tee seen,, rather than 
the one that scores highest on any one criterion, 

Hext, we consider general answers to the questions 
posed in this step for the cases of; a) vegetativeiy 
25 growing bacterial cells (Sec. t b) bacterial spores 

(Sec. 1*2} f and c) (Sec. 1.3), Preferred OSPs for 
several GPs are given in Table 2, 

SSSUJL-AS Bacterial, C2^„: sjitiiet c = v 

One m&y choose any veil-characterised bacterial 
strain which ssay be grown in -culture. 'the important 
questions in this case are: a) do we know enough about 
mechanises that localise proteins on the outside of the 
35 call, b) will the XPBD fold in the environment of the 
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outer jsMssafoxane, and c) will ceils change egression of 
Qsa-oM, derived from <t4p~:jm%, during affinity 
separation? Sojr® W&V®. may need large or insoluble 
prosthetic groups, such as an Fe 4 S 4 cluster, that axe 
3 available within the cell, but not in the medium. The 
formation of Fe 4 S 4 clusters found in some ferrcdoxins 
is catalysed by enzymes found in the ceil (BOH085) - 
XPBBs that require such prosthetic groups may fail to 
fold or function if displayed on bacterial cells. 

10 

Sep ,, 1* 1. 1..; Preferred Becf erlal ,,^eIls_^e^P i 

In view of the extensive knowledge of 3L. cell, a 
strain of soli, defective in recombination, is the 
15 strongest candidate as a bacterial GF. Other preferred 
candidates are MlMJlUl typh,i,mur,i», MSAUm 
subtil ill , and Fseudomonas aeruginosa. 

Sec, 1.1,3; Preferred Cater Surface Proteins for 

20 Da sola v . c^erlal^cellsx 

Sram~negative bacteria have cuter-membrane 
proteins (OMP) , that form a subset, of OSPs. Many OMPs 
span the. membrane one or Bore times. The signals that 

25 cause OMFs to localize in the outer membrane are 
encoded in the amino acid sequence of the mature 
protein. Fusions of fragments of sm genes with 
fragments of an & gene have led to X appearing on the 
outer membrane {BESSS4* cmmt) . If no fusion data are 

30 available, then we fuse an inbd fragment to various 
fragments of the o^p gene and obtain GPs that display 
the osu-iobd fusion on the cell outer surface by 
screening or selection for the display-of -XPBD 
phenotypo - 

35 
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Oliver lias reyisisd mechanisms of protein 
secretion in bacteria (OLXVS5 and oiufUS?) . Hikaido and 
Vaara {NIXhS?} have: reviewed mechanises by -which, 
proteins become localised to the cater membrane of 
5 Srais~ne.gative bacteria. For exaaple, the LamB protein 
of JBU. .ssii is- synthesized with a typical signal™ 
seguen.ce which is subsequently rssovsd > Benson et ai.,.. 
{BENS a 4-} showed that Lam.B~tac2 fusion proteins woe Id toe 
deposited in the outer aassrsne of co.l.i when 

10 residues 1-49 of the mature LamB protein are included 
in the fusion, but that residues 1-43 axe insufficient. 

LamB of &S. is- a porin for maltose and 

maltodextrin transport, and serves as the receptor for 

IS adsorption of bacteriophages lasab&a and K10. This 
protein has been purified to homogeneity (ends? a) and 
shorn to function as a briber (PhLV79) . Mutations to 
phage resistance have been used to define the parts of 
the LamB protein that adsorb each phage (EOMSSQ, 

20 CLEMS 3 ,- SEERS?) . 

Topological models have been developed that 
describe the function of phage receptor and 
maltodextrin transport. The models describe these 
as donairs and their locations with respect to the 
surfaces of the outer membrane { CLEMS 1 ? CLEMS 3,, CHAPS 4, 
BETHSS) * 

LamB is transported to the outer membrane if a. 

30 functional N-tsrrinai sequence is present; further f the 
first 49 amino acids of the mature sequence are 
required for successful transport (BEHSS4) . Homology 
between parts of LamB protein and other outer membrane 
proteins ompC, OmpP and PhoE has teen detected 

35 (HXKA84) , including homology between LamS amino acids 
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39-49 and sequences of the other proteins. These 
subsequences may label the proteins for transport to 
the outer isembrane. Further, monoclonal antibodies 
derived from mice immunised with purified LamB, have 
5 been used to characterise four distinct topological and 
functional regions, two of match are concerned with 
maltose transport (GAB&82) ♦ 

gee- 1,1.3 Choice of Insertion Mtg for l^mt.~in 

10 ,Mc£^iiyik^y,Losp:. 

For fusions of the phpj, into the coding sequence 
for an integral Tseebr&ne protein , the PhoA domain is 
localised according to where in the integral letsbrsne 

15 protein the ohoA gene was inserted (BECKS 3 and hangs 6 } 
That is, it phoA is inserted after an amine acid which 
normally is found in the cytoplasm, then PhoA appears 
in the cytoplasm. It phok is inserted after an amino 
acid normally found in the periplasm, however, then the 

30 PhoA domaip is localized en t&e periplastic side of the 
membrane, and anchored in it, Beckwith and colleagues 
{ BECKS 8} have extended these observations to the Mpj| 
qane that can be inserted into genes for integral 
membrane proteins such that the Lac 2 domain appears in 

25 either the cytoplasm or the periplasm according to 
where the gene was inserted* 

OSF-XPBD fusion proteins need not fill a 
structural role in the outer membranes of Gram-negative 

30 bacteria because parts of the outer membranes are not 
highly ordered* For large OSPs there is likely to be 
one or more sites at which osp can be truncated and 
fused to lobd such that cells expressing the fusion 
will display iPBDs on the cell surface. If fusions 

35 between fragments of osn and je have been shown to 
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display :X on the cell surface., we can design an osp- 
Ipfod gene by substituting ijsM for x in the m& 
sequence., otherwise f successful OMF-IPBD fusion is 
preferably sought by fusing fragments of the best pro 
5 to an ipfed, expressing the fused gene, and testing the 
resultant SFs for dispiay-of-IPBD phsnotypa, We use 
the available data about am to pick the point or 
points of fusion between euro era iphd to maximize the 
likelihood that XPED mil be displayed. Alternatively, 

ID we truncate osp at several sites or in a manner that 
produces osp. fragments of variable length and fuse the 
osp fragments to Iphd; cells expressing the fusion are 
screened or selected which display IPBDs on the ceil 
surface, &n additional alternative is to include short 

15 segments of random DHA in the fusion of pigo fragments 
to iohd and then screen or select the resulting 
variegated population for members exhibiting the 
display -of -XFEO phenotype * 

.20 The promoter for the osprlpbd gene, preferably, is 

subject to regulation by a small chemical inducer, such 
as isopropyi thiogalactoside (IF?G) flag ws promoter}* 
It need not com© from a natural ss& gene; any 
regulatable bacterial promoter can be used {M&MXS2} < 

as 

One® a genetic packaging system employing 
vegetative bacterial cells has been designed , it is 
time to choose an IPBD (Sec, 2} , 

3 0 Sec, 1.1,. 4; In Vivo Selection for Pseodo-psp Gene Frost 

land' ) . ■u-.-erts ~n It. ^ liti 

As an alternative to choosing a natural QSP and an 
insertion site in the OSP, we can construct a gens 
35 comprising: a) a reguiatabie promoter (e,.g... IscOSIj - b,t 
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f c) a periplasm ic transport: 
fusion of the ipM gene with a 
•{as in Kaiser et al>. <KWBS?)) , 
and f) a transcriptional terminator, 
.v&ich preferably comprises so- 3 00 
aereug potential QSTSv {EF. XMS8?) 
fusion of iofod and the random DBA could fee in 
either order, but jpbd upstream is slightly preferred. 
Isolates from the population generated in this way can 
foe screened for display of the XPBO. Preferably, a 
version of selection-through-binding is used to select 
GPs that display XPBD on the GP surface., and thus 
contain a DKA insert encoding a functional OSTS. 
Alternatively, clonal isolates of <3Ps may be screened 
for the display ~of~XPBD phenotypa. 

The preference for -jphd upstream of the random DKA 
arises from consideration of the manner in which the 
successful GP (XPBD) will foe used. In Part XII, V« will 
introduce numerous mutations into the phd region of the 
some of which might include gratuitous 
If sM precedes the random DMA* then 
gratuitous stop cotes in pM lead to no QSP-PBD 
protein appearing on the cell surface. If pM follows 
the random DMA, then gratuitous stop cotes in pM 
might lead to incomplete 0SP-PBD proteins appearing on 
the cell surface, incomplete proteins often are non- 
specific-ally sticky so that GPs displaying incomplete 
PBDs are easily removed from the population, 

Bacterial spores have desirable properties as GP 
candidates, Bacillns spores neither actively 
metabolize nor alter the proteins on their surface. 
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However, spores are inch more resistant than vegetative 
bacterial cells or phage to chemical and physical 
agents, spores have the disadvantage -that the 
molecular- mechanisms that trigger spcruiation are loss 
5 well 'worJ?«d out than is the .formation of Ml 3 or the 
export of protein to the outer 3 1 

Serv ,J < - N i i as GPst 

10 Bacteria of the genus Bacillus form endcspcres 

that are extremely resistant to damage, by heat, 
radiation f desiccation, and toxic cbe»icals (reviewed 
by Losick al> (L0SXS6) } ♦ These spores have complex 
structure and moti>hogenssis that is species-specific 

13 and only partially elucidated. The following 
observations are relevant to the use ox I >: vj,:ll spores 
as genetic packages « 

Plasmid DHA is commonly included in spores. 

20 Piasmid encoded proteins have been observed on the 
surface of Bacillus spores (DEBRS6) . Sparulatiom 
involves complex temporal regulation that is moderately 
well understood (L0SIS6) » The sequences of several 
sporuXation promoters are known; coding sequences 

25 operatively linked to such promoters are expressed only 
during sporulation {RA.YCS7 ) , 

Donovan g& al : . have identified several polypeptide 
i opponents ox B re coat ( DOS the 

30 sequence© of two complete coat proteins and amino- 
terminal fragments of two others have been determined* 
Some components of the spore are synthesized in the 
£ o reaper e, e...g.,,. small acid-soluble, spore proteins 
SPRI8S) > while other components are synthesized in the 
33 .eother cell and appear in the spore ( e > a . the coat 



worn/mm 



per i sj^-'oyji 



39 

proteins) » This spatial organisation of synthesis is 
controlled at the transcriptional level. 



Spores self-assemble, but the signals that cause 
s various proteins to localise in different parts of the 
spore are not well understood; presumably, the signals 
controlling deposition of the coat proteins fro® the 
cytoplasm of the mother cell onto the spore coat are 
embedded in the polypeptide sequence. Some, but not 

10 all, of the coat proteins are synthesized as precursors 
and are then processed by specific proteases before 
deposition in the spore coat . Viable spores 

that differ only slightly from wild-type are produced 
in suotllis even if any one of four coat proteins is 

IS missing (330N087) . Disulfide bonds .form within tne 
spore {thiol reducing agents are needed to solufoilite 
several of the proteins of the coat) . The ia&d coat 
protein, CotD, contains 5 cysteines, CotD also 
contains an unusually high number of histidines (16) 

29 and prolines {?}* The XXM ooat protein, Cote:, 
contains only one cysteine and one methionine. CotC 
has a very unusual amino-aeid sequence with 19 lysines 
(K) appearing as 9 K~K aipeptidss and one isolated K. 
There are also 20 tyrosines (X) of which 10 appear as 5 

25 Y-Y dipeptides, Peptides riot in Y and K are known to 
become cross! inked in oxidising environments {DBV078 , 
WAXTS3, WAITS 6} „ CotC contains 16 D and E amino acids 
that nearly equals the X9 Ks. There are no h, R, X, 
h, *, P, Q, S, or w amino acids in CotC. Neither CotC 

30 nor CotD is post-translationaliy cleaved. The proteins 
CotA and CotB are post-translationaily cleaved. 
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species may require days or weeks to sporulate. In 
addition, genetic Icnowladge and manipulation is much 
more developed: for subtil is than for other spore™ 
forming bacteria. Thus. Bacillus spores are preferred 
S over 2_- - cs spores. Bacteria of the genus 
Cl ostridial also form very durable eudosporas, but 
Clostridia being strict anaerobes., are not convenient 
to culture. The choice of a species of Bacillus is 
governed by knowledge and availability of cloning 

IS systems and by how easily sporulation can be 
controlled, A particular strain is Chosen by the 
criteria listed in Sec, 1.0,. Many vegetative 
biochemical pathways are shut- down when sporulation 
begins so that prosthetic groups might not be 

IS available. 

SS^^JLAsA E?P£ferred oater-$Mgg3iS^^ — for; 

Displaying,, IPBp oiL.3oct c > - ^ = 

20 If a spore is chosen as GP, the promoter is the 

saost important part of the oso, gene, because the 
promoter of a spore coat protein is most active; a) 
when spore coat protein is being synthesis ad and 
deposited onto the spore and b) in the specif ic place 

25 that spore coat proteins are being made. In JL, 
subtil is , some of the spore coat proteins are post- 
translationaliy processed by specific proteases. Xt is 
•valuable to know the sequences of precursors and mature 
coat proteins so that we can avoid incorporating the 

30 recognition sequence of the specific protease into our 
construction of an GSP-IPBD fusion. The sequence of a 
mature spore coat protein contains information that 
causes the protein to foe deposited in the spore coat? 
thus gene fusions that include some or all of a mature 
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coat protein sequence are preferred tor screening or 
selection for the. <3isplay~o£~XPBB phenot^pe. 

Fusions of lord fragments to goto or cotD. 
S fragments are likely to cause IFBD to appear on the 
spore surface, The genes cgtc and cot^ are preferred 
cap genes because Cote and CotD are not post- 
translationally cleaved. Subsequences from «r 
cots could also be used to cause an IPSO to appear on 

10 the surface of subtil is spores,, tout we must take the 
post«translationai cleavage of these proteins into 
account* DMA encoding IPBD could be fused to a 
fragment of cotA or gotB at either end of the coding 
region or at sites interior to the coding region. 

13 Spores could then be screened or selected for the 
display-cf~IPBO phenotype. 

To date, no Bacillus sporulation promoter has been 
shown to be inducible by an exogenous chemical inducer 
20 as the lac promoter of JL. coll, Msvertheless, the 
quantity of protein produced from a speculation 
promoter can be controlled toy other factors, such as 
the m& sequence around the Shine-Dalgamo sequence or 

25 

Se&* Choice of Insertion,, sim,J.gr ...IJPBD Aa^oag 

of Mf^^kMSStS^. 



The considerations governing insertion site in the 
spore OSP are the same as those given in Section 1.1*3, 
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Although the considerations for spores are nearly 
identical to the considerations .for vegetative 
bacterial cells {sec, 1,1} , the available information 
on the mechanisms that cams© proteins to appear on 
S spores is meager so that use of the random- approach 
becomes a mors attractive option. 

We can use the approach described above at 1,1,4 
for attaching an IPBD to an E«, coli cell, except that? 
10 a) a sporulation promoter is used, and h) no 
periplastic signal sequence should be present, 

M<S.x 3Li 3 : PlsffiMyi t 

13 Silgl^U^^ 

Online bacterial cells and spores, choice of a 
phage depends strongly on knowledge of the 3D structure 
of an OS? and how , it interacts with other proteins in 
20 the caps id. The sise of the phage genome and the 
packaging mechanism are also important because the 
phage genome itself is the cloning vector. The mr, 
i^M gene must be inserted into the phage genome? 
therefore: 

as 

1) the virion must be capable of accepting the 
insertion or substitution of genetic material, and 

Z) the genome of the phage most be small enough to 
30 allow convenient; manipulation. 



Additional considerations in 'choosing phage are? 1} 
the morphogenetic pahhwav of the phage determines the 
environment in which the I PSD will have opportunity to 
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fold, 2} IFBDs containing essential disulfides say not 
fold within a ceil, 3} XFBOs needing large or insoluble 
prosthetic groups may not fold if secreted because the 
prosthetic group is lacking, and 4) when variegation is 
introduced in Part III, multiple infections could 
generate hybrid GPs that carry the gene for one PBD but 
have at least some copies of a different PBD on their 
surfaces? it is preferable to minimise this 



I© 

Bacteriophages are excellent candidates for GPs 
because there is little or no snsymatie activity 
associated with intact mature phage, and because the 
genes are inactive outside a bacterial host, rendering 
15 the mature phage particles metaholicaliy inert. The 
filamentous phage M13 and; bacteriophage PhiX174 are of 
particular interest . 

_ ". Z2®3SL~L 

20 

The entire life cycle of the filamentous phage 
M13, a common cloning and sequencing vector,- is well 
understood. SCL3 and fl are so closely related that we 
consider the properties of each relevant to both 

25 (mscm-t any differentiation is for historical 
accuracy. She genetic structure (the complete .sequence 
(SCHA78), the identity and function of the ten genes, 
and the order of transcription and location of the 
promoters) of KX3 is well >tnown as is the physical 

30 structure of the virion < BANKS 1, BOEK80, CHAN? 9 , 
1TOK79, KAPL78 r KDHHSSfo, KDHJ?87, K&KO80, MARV78, 
HESS*' 8 , OHKAS1, RASC86, 8USS81, SCHA7S t SHITS 5 , WEBS 7 8 , 
and SIMMB2 } ; see kASCSS for a raeant review of the 
structure and function of the coat proteins. 
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Relevant facts about MT3 are disclosed in. Example 
Bacteriopha ge Phi xi74 % 

5 

The bacteriophage PhiXl?4 is s very small 
icos&fae&ral virus which has been thoroughly studied by 
genetics, bic am bry , and electron microscopy (See 
The S inole~S trended DMA Phages (8BKH7&) ) . To date, no 

10 proteins from Phiv.174 have been studied by X-ray 
diffraction. Phixi?4 is not used as a cloning vector 
because FhiXl74 can accept almost no additional BH&.?. 
the virus is so tightly constrained that several of its 
genes overlap. Chambers et aX s .. (CH&M82) showed; that 

IS mutants in gene £ are rescued by the wild-type & gene 
carried on a plasmid so that the nosh supplies this 
protein. 

Three gene products of PhiX174 are present on the 
20 outside of the mature virion: ? (caps id) ,, G (major 
spike protein,- so copies per virion) , and H (minor 
spike protein, 12 copies per virion). The Q protein 
comprises 175 amino acids, while H comprises 328 amino 
acids. The F protein interacts with the single- 
25 stranded DBA of the virus. The proteins F, and K 
are translated from a single jtm& in the viral infected 
cells « 

Large , DMA Phages 

Phage such, as lambda or T4 have much larger 
genomes than do H13 or PhiX174, Large genomes are less 
convenient] o ear ; than small, genomes, -A phage 

with a large genome, however, could be used if genetic 
35 manipulation is sufficiently convenient. Phage such as 
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lambda and T4 mere complicated 3D eapsid 

structures than M1.3 or 1*1X174, with more GSPs to 
choose from, Phage laxaJbda virions and phage T4 virions 
for® intraeeiiuiarly,. so that XPBBs requiring large, or 
5 insoluble prosthetic groups might fold on the surfaces 
of these phage. Phage lambda and phage T4 are not 
preferred, however, derivatives of these phages could 
be constructed to overcome these disadvantages. 

io pjs& Phages 

BHA phage, such as Qbeta, are not preferred 
because manipulation of KHA is much less convenient 
than is the manipulation of DMA. Although competent 
15 mh bacteriophage are not preferred, useful genetically 
altered BHA~«ontaining particles could be derived from 
SNA phage, such as MS 2* 

to use MS 2 as a SF, m would need to eliminate 
20 most of the natural viral genome so that an PJipyjpM 
g«me could fit into the protein caps id- It is known 
that the A protein binds sequence- spec if ically to a 
site at the S / end of the * RHA strand triggering 
formation of RMA~containing particles if coat protein 
25 is present. If a message containing the A protein 
binding sit© and the gene for a chimera of coat protein 
and a PBD were produced in a ceil that also contained A 
protein and. wild-type coat protein (both produced from 
regulated genes on a plasmid) , then the m& coding for 
30 the chimeric protein would get packaged* A package 
comprising m& encapsulated by proteins encoded by that 
Ri4A satisfies the major criterion that the genetic 
message inside the package specifies something on the 
outside. The particles by themselves are not viable. 
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httMr isolating th© packages that carry an SBD, we 
would need toi 

1) separate the KMA from the protein eapsid,. 

5 

2) reverse transcribe the into Wh* using AMV 
or MM1?V reverse transcriptase., and 

3) amplify the 98A by several cycles of polymerase 
10 chain reaction (PCE) until there is enough to 

subclone the recovered genetic message into a 
plasmid for sequencing and further -work. 

jatematively,. helper phage could be used to rescue the 
15 isolated phage, 

SfifiU X&OZL Preferred; g^S^S ^ m jreteins, fox 

MmLs^m..imm m..&&ms& 

•20 For a given bacteriophage, the preferred OSP is 

usually one that is present on the phage surface in the 
largest number of copies, as this allows the greatest 
flexibility in varying the ratio of OS3?»XPBD to wild 
type OSP and also gives the highest likelihood of 

25 obtaining satisfactory affinity separation. Moreover, 
a protein present in only one or a few copies usually 
performs an essential f miction in morphogenesis or 
infection; mutating such a protein by addition or 
insertion is likely to result in reduction in viability 

30 of the GP. 

It is preferred that the wild-type osp gene be 
preserved , The ipM gene fragment may toe inserted 
either into a second copy of the recipient osp gene or 
3 5 into a novel engineered osp gene. The preferred OSP 
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for use when the sp is HX3 is the gene XXI protein (« 
Example 1) « 



leg,, 1,3 02 M^j|„;,»rtto fox m£t I il.osp,;, 

s 

The user must choose a site in the candidate OS? 
gene for inserting a isM g«ne fragment. The coats of 
most bacteriophage are highly ordered, <Shi*» in 
bacteriophage , unlike the cases of bacteria and spores, 

10 it is important to retain most or all of the residues 
of the parental QSP in engineered OS'P-IPBB fusion 
proteins. A preferred site for insertion of the iphd 
gene into the phage ptp gene is one in which; a) the 
SPED folds into its original shape, h) the QSP domains 

IS fold into their original shapes, and c) there is no 
interference between the two domains. 



Xf there is a 3D model of the phage that indicates 
that either the amino or carboxy terminus of an OS? is 
20 exposed to solvent, then the exposed terminus of that 
mature QSP becomes the prime candidate for insertion of 
the ipfed gene, A low resolution 3-D model suffices. 

In the absence of a 3D structure, the amino and 
25 carboxy termini of the mature OSP are the best 
candidates for insertion of the iaM gene. A 
functional fusion may require additional residues 
between the SPED and OSP domains to avoid unwanted 
interactions between the domains. Random-sequence M 
30 or DMA coding for a specific sequence of a protein 
homologous to the XPBD or OSP, can be inserted between 
the osp fragment and the i.pbd fragment if needed. 



Fusion at a domain boundary within the OSP is also 
a good approach for obtaining a functional fusion. 
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Smith exploited such, a boundary when subcioning 
heterologous DHA into gems III of fl {SMITS5} « 

Tfeere are several methods of identifying domains, 
5 Methods that rely on atomic coordinates have been 
reviewed by Janin ana Chothia (JAHXSS) see also roses 5. ( 
RASH84 f VITAS 4 , PABG79, FOTE83 f and SCOTS? , 

If the only structural information available is 
10 the amino acid sequence of the candidate QSP f we. use 
i 1 st turns and loops * There is a 
high probability that some of the loops and turns will 
he correctly predicted and Pasman,. (CHQu72) } ; 

these locations are also candidates for insertion of 
IS the iobd gene fragment, 

sec.-. l ... 3.4; in vivo . . . s.*. N % " - n o - 

20 Alternatively, a functional insertion site nay be 

determined by generating a number of recombinant 
constructions and selecting the functional strain by 
phenotypio characteristics. Because the QSP-XPSD wt&t 
fulfill a structural role in the phage coat,- it is 

25 unlikely that any particular random DMA sequence 
coupled to the ipM gene will produce a fusion protein 
that fits into the coat in a functional way. 
Nevertheless , random DMA inserted, between large 
fragments of a coat protein gene and t jene vil.] 

30 produce a population that is likely to contain one or 
more members that display the X&Bt> on the outside of a 
viable phage, A display probe,- similar to that del 
in lxl,4 ? is constructed and random DMA sequences 
cloned info appropriate sites* 

35 
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An 1PBD may he chosen fro® naturally occurring 
proteins or domains of .naturally occurring proteins, or 
5 way be designed from first principles. A designed 
protein may have advantages over natural proteins if s 
a) the designed protein is more stable, b) the designed 
protein is smaller, and c) the charge distribution of 
the designed protein can be specified mors freely, 

xo 

A candidate IFED must meet the following criteria; 
1} stabiility under the conditions of its intended use 
(the domain may comprise the entire protein that will 
foe inserted, e,a. BSTI) , 2) knowledge of the amino acid 
IS sequence is obtainable, 3} identification of the 
residues on the outer surface, and their spatial 
relationships , and 4} availability of a molecule, 
AfHCXPBD) having high specific affinity for the XPBD. 

*2.0 Preferably, the XPBD is no larger than .necessary 

because it is easier to arrange restriction sites in 
smaller amino-acid sequences- The usefulness of 
candidate IPBDs that meet all of these requirements 
depends on the availability of the information 

25 discussed below. 

Information used to judge IFBD suitability 
includes: 1) a 3D structure (knowledge strongly 
preferred) , 2 } one or more sequences homologous to the 

30 XPBD (the more homologous sequences known, the better) , 
3) the pi of the XPBD (knowledge necessary in some 
oases), 4) the stability and solubility as a function 
of temperature, pH and ionic strength (preferably known 
to foe stable over a wide range and soluble in 

35 conditions of intended use), 5) ability to bind metal 



wo mminm 



PCF/US89/03731 



50 

ions such as Ca** or Mg** (knowledge: preferred.? binding 
pear sa ,. no preference) , 6} enzymatic activities, if any 
£ knowledge preferred, activity p$r, SM has uses but stay 
cause problems) , ? ) binding properties, if any 
5 (knowledge preferred, specific binding also preferred},, 
8) availability of a molecule having specific and 
strong affinity { % < XO^ 11 H) for the IPBD 
(preferred) f 9) availability of a molecule having 
specific and medium affinity ( ac~ s M < % < 1G~ S M) 
10 for the XPSD (preferred) , 10) the sequence of a mutant 
of XPBD that does not bind to the affinity molecule (s) 
(preferred) , and 11) absorption spectrum in visible, 
XPT f mSR, (characteristic absorption preferred) 

IS If only one species of molecule Having affinity 

for IPBD {AfM(XPBD)) is available - t it will be used to; 
a) detect the XPBD on the GP surface, b) optimize 
expression level and density of the affinity molecule 
on the matrix (Sec. 10,1), and c) determine tne 

20 efficiency and sensitivity of the affinity separation 
(Sees. 10.2 and 10.3) » as noted above, however, one 
would prefer to have available two species of 
A£M{IFBD) , one with high and one with moderate affinity 
for the IPBD. The species with high affinity would, be 

•25 used in initial detection and in determining efficiency 
and sensitivity (10,2 and 10,3) , and the species with 
moderate affinity would be need in optimization (10,1) » 

For at least 20 candidate XPSDs the above 
30 information is available or is practical to obtain, for 
example, bovine pancreatic trypsin inhibitor <BF£X, 58 
residues), crambin (4 6 residues) , third domain of 
ovomucoid (56 residues) , T4 lyseryne (164 residues) , 
and asurin (128 residues) . 
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m&t of the. PBBs derived from a PPBD according to 
the process of the present invention affect residues 
Having side groups directed toward the solvent. 
Exposed residues can accept a wide range of amino 
5 acids, vbi.ia buried residues are more limited in this 
regard (RE IDS 8) . Surface stations typically have only 
small effects on melting temperature of the 3?BD , but 
may reduce the stability of tiie PBD» Hence the chosen 
XPBD should have a high melting temperature (60°C 

10 acceptable, the higher the better ) and foe stable ever a 
wide pH range {8.0 to 3-0 acceptable f 11.0 to 2.0 
preferred) , so that the SSDs derived from the chosen 
XPBD by mutation and selection-through ~b inding will 
retain sufficient stability. Preferably, the 

15 substitutions in the IPBD yielding the various PBBs do 
not reduce the melting point of the domain below Sb°C, 

Two general characteristics of the target 
molecule,, else and charge, make certain classes of 

20 XPBDs taore likely than other classes to yield 
derivatives that will bind specifically to the target. 
Because these are very general characteristics, one can 
divide all targets into six classes; a) large positive, 
b) large neutral, c) large negative, d) small positive, 

25 e) small neutral, and f) small negative. A small 
collection of IPBDs, one or a saw corresponding to each 
class of target, will contain a preferred candidate 
IPBD for any chosen target* 

30 Alternatively, the user may elect to engineer a 

GP(XPBD) for a particular target? Sec 2.1 gives 
criteria that relate target size and charge to the 
choice of XPBD, 

as § m 
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Xf the target is a protein or other macromoleeule 
a preferred em N the IFBC ii - sma L protaitt 

such, as BFTI from 80s taurus (58 residues) , crambin 
5 from rape seed (46 residues) , or the third domain of 
ovomucoid from Conor" tu.rj3 " f l§ Japanese 

quail) (S6 residues} (PAF&S2) t because targets from 
this class have clefts and grooves that can accommodate 
small proteins in highly specific ways* If the target 
10 is a maeromolecula lacking a compact structure , such as 
starch, it should be treated as if it were a small 
molecule. Extended macromolecules with defined 3D 
structure, such as collagen, should foe treated as large 
molecules * 

15 

If the target is a small molecule t such as a. 
steroid; a preferred embodiment of the 1PBD is a 
protein the size of rlhonuelease from ists (124 
residues), rlbonuclease .from Agperc ; 1 "xililf (104 

20 residues) , hen egg white lyso^yme from Sailus ga],lus 
(129 residues) , asurin frost Pssudomcnas aeruginosa (128 
residues) , or T4 lyso&yme (1€4 residues) ,< because such 
proteins have clefts and grooves into which the small 
target molecules cast fit. The Broo&haven Protein Data 

25 Bank contains 3D structures for these proteins. Genes 
encoding proteins as large as T4 lys© gyrate can be 
manipulated by standard techniques for the purposes of 
this invention, 

30 If the target is a mineral , insoluble in water , 

one must consider the nature of the mineral's molecular 
surface. Smooth surfaces,. (such as crystalline 
silicon) require medium to large proteins (such as 
rlbonuclease) as IFBD in order to have sufficient 

35 contact area and specificity. Bough, grooved surfaces 
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(seolites) , could .Ije bound either by small proteins 
(BPTX) or larger proteins (T4 lysozyms) . 

Sec^l^Jj^^I^^engl £?£ target: cr:ir;:;>:; ;in..„ J ;;.:v;:.i : .C::: o£ 

5 IPBD; 

Electrostatic repulsion between molecules of like 
charge can prevent molecules with highly complementary 
surfaces from binding. Therefore, it is preferred 
10 that, under the conditions of intended use, the IPBD 
and the target molecule either have opposite charge or 
that one of them is neutral « inclusion of counter ions 
can reduce or eliminate, electrostatic repulsion. 



IS Sec, 3.3: Other of choice of 2 

If the chosen IPBD is an ensyme, it may be 
necessary to change one or more residues in the active 
site to inactivate enzyme function. For example,, if 

20 the IPBD <£4 lysoaym® and the SP were .JU coll cells 

or H13 , we would inactivate the lysozyme lest it lyse 
the cells, if,, on the other hand, the GP were PhiX174, 
then deactivation of iysozyme may not be needed because 
T4 lysosyae can be overproduced inside soil cells 

25 without detrimental effects and PhiXX74 forms 
intracellular!?. It is preferred to inactivate enzyme 
iPBDs that might be harmful to the GP or its host toy 
substituting mutant amino acids at one or more residues 
of the active site. It is permitted to vary one or 

30 more of the residues that were changed to abolish the 
original enzymatic activity of the IPBD. Those CPs 
that receive gsp-gbd genes encoding an active enzyme 
may die, tout the majority of sequences will not be 
deleterious , 

3S 
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sec > 3 .; aaia^OSii 

The OCV is preferably small f e.g., less than 10 
KB-, It is desirable that cassette mutagenes is be 
5 practical in the GCV? preferably, at least 25 
restriction ensymes are available that do not cat the 
OCvl It is likewise desirable that siagle-st.ran.ds4 
mutagenesis be practical- Finally, the OCV preferably 
carries a selectable jsa.rk.sr. A suitable OCV is 

10 obtained or is engineered by manipulation of available 
vectors.* Plasmids are preferred over the bacterial 
chromosome because genes on plasmids are much Bora 
easily constructed and mutated than are chromosomal 
genes. When bacteriophage are to be used,, the osp-ipbd 

15 gene must be inserted into the phage genome, 

For phage such as M13, an antibiotic resistance 
gene is engineered into the genome (HIKES 0} , More 
virulent phage ? such as PhiXl?4 f make diseernable 

20 plagues tbat can be picked., in which case a resistance 
gene is act essential? furthermore, there is no room in 
the Phixi.7 4 viri.cn to add any new genetic material. 
Inability to include an antibiotic resistance gene is a 
disadvantage because it limits the number of CPs that 

25 can be screened. 

It is preferred that GP(XPBB) carry a selectable 
marker not carried by wtOP, It is also preferred that 
wtGP carry a selectable marker not carried by SP { XP8D) > 

30 

m. design an amino acid sequence that will cause 
35 the XPBD to appear on the GP surface when it is 
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expressed. This amino acid tt&queaee may determine the 
sntira coding region of the gahe, or it -may 

contain only the ipM eegaenee adjoining restriction 
sites into which random » will fee cloned (See, 6,2}, 

5 

The actual gene may be produced fey any means. The 
pbd segment, derived from tke »M segment, must he 
easily genetically manipulated In the wavs described in 
Fart m. Synthetic segments are preferred 

10 because they alio* greatest control over placement of 
restriction sites, 

Sec, 4 ,1 Genetic ..regulation Q.f, the oso~lm4,,aM^L 

IS Regarding regulation of the ftfeia&l gens,, the two 

important questions are: a) how much QSP-IPBB do we 
need on each GP, and to) how accurately must we regulate 

20 The essential function of the affinity separation 

is to separate m$ that bear PBDs (derived from IPBD) 
having high affinity for the target from SPs bearing 
PBDs having low affinity for the target. If a gradient 
of some solute, such as increasing salt, changes the 

25 conditions, then all weakly-binding PBDs will cease to 
bind before any strongly ^binding PBDs cease to bind. 
Regulation of the gaBteBfa* gene must be such that all 
packages display sufficient PBD tc effect a good 
separation in Sec 15. If the amount of PSD/OP had an 

30 effect on the elution volume of the GP from the 
affinity matrix, then we would need to regulate the 
amount, of PBD/SP accurately. The following analysis 
shows that there is no strong linear effect of IP8D/SP 
on elution volume and assumes only? a) that all CPs are 

35 the same siae, b) that interactions between the PBDs 
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and the affinity matrix dominate differential elution 
ox GPs, c) tna± the system is at equilibrium, and d} 
that all PBGs on any one OF are identical. 

5 If H s identical PBDs on a GP each have access to 

target molecules, and each P&Q has a free-energy of 
binding to the target of delta G^, then the total free 
is 

10 delta G^ tot * Hp * delta ♦ 

Delta Gjj is a function of parameters of the solvent, 
such as: 1) concentration of ions, 2} pH, 3} 
temperature,. 4) concentration of neutral solutes such 

IS as' sucrose, glucose, ethane! , etc* » 5) specific ions, 
such as, calcium, acetate, benssoate, nieotinate, ejbc^ 
If conditions are altered during affinity separation m 
that delta % approaches aero, delta G^"- 01 - approaches 
»ero asfp times faster. As delta G^*®^ goes to or above 

20 mtp, the packages will dissociate from the immobilised* 
target molecules and he e luted* 

GPs bearing more PBDs have a sharper transition 
between hound and unbound than packages with, f ewer of 
25 the same PBDs, For equilibrium conditions, the mid- 
point of the transition is determined only by the 
solution conditions that bring the individual 
interactions to JSoro free-energy. The number of 
PBDs/GP determines the sharpness of the transition. 

30 

It should also be noted chat the number of PBDs/G? 
is usually Influenced by physiological conditions so 
that a sample of genetically identical GP(PED)s stay 
contain GPs having different numbers of PBDs on the GP 
35 surface. In a population r- GP(vg 
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ssqpeiiee will appear on more that one GP,, and the 
actual number of PSDs/GP will vary from GP to within 
some range. Wit&ia a variegated population of PBBs, 
let PBD X toe the PSD with mteimm affinity for the 
5 target. If there is a linear effect on elution volume 
of number of PBDs/GF, then the GPs having the greatest 
number of PBD K will be aost retarded on the column. 
When wa culture the. enriched population the GP{FBD X ) 
will be amplified and give rise to new GPCPBD^Vs having 

10 varying numbers of PSB^GPl Thus the affinity 
separation process of the present ixr 
tolerat® a linear offset of number of PSDs/GP on the 
elution volume of the GP (PSD) unless strong binding to 
target fortuitously causes the PBD to be displayed on 

IS the GP only in low number* 

since there is no linear effect on elution volume 
from the nusnhsr of XFBDs/SP, need for highly accurate 
regulation of XPBD/SP is not anticipated, Reproducible 

20 gene expression is more easily controlled using 
regulated rather than constitutive genetic elements, 
The analysis above assumes that <3S>(XPBD)s are in 
equilibrium between solution in buffer and bound to the 
affinity matrix. Rate of elution may he an Important 

25 parameter in column affinity chromatography. In batch 
elution from an affinity matrix or elution from an 
affinity plate,, the time that each buffer is in contact 
with the affinity material may he an important 
variable. The density of affinity molecules on the 

30 matrix is an important variable in optimizing the 
affinity separation. Because the analysis above is 
qualitative f In Sec, 10 of the preferred embodiment we 
experimentally optimize: 1) the density of IPSO on the 
GP surface, 2) the density of affinity molecules on the 

35 affinity matrix, 3} the initial ionic strength, 4) the 
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elation rate, and S) the quantity of GP/ ( volume of 
matrix) to be loaded, on the coIumi. 

Transcriptional regulation of gene expression is 
5 best understood and most effective, so we focus our 
attention on the promoter. A .number of promoters are 
known that can be controlled by specific chemicals 
added to the culture medium* For example, the Ino.L'Vd 
promoter is induced if isopropylthiogalactosi.de is 

IS added to the culture medium, for example, at .between 
1,0 «M and 10,0 mis. Hereinafter., we use <»XXHDOCE M as a 
generic term for a chemical that induces expression of 
a gene. If transcription of rbe csc. r .ipj;;d gene is 
controlled by XHimc& t then the number of qsp-xpbds per 

15 GP increases for increasing concentrations of XIHDBCE 
until a fall -off in the number of viable packages is 
observed or until sufficient XFBD is observed on the 
surface of harvested .G£(XPSD)&« 

20 fhe attributes that affect the maximum number of 

GSP-XPBBs per GP are primarily structural in nature. 
There may be steric hindrance or other unwanted 
interactions between IPBOs if OSP-I'PBD is substituted 
for every wild-type OS?. Excessive levels of OSP-XFBD 

25 may also adversely affect the solubility or 
morphogenesis of the GP. For cellular and viral 
as few as five copies of a protein having affinity for 
another immobilized molecule have resulted in 
successful affinity separations (fESISaa, FERES2b f and 

30 SMXTSS) » 

Another consideration of promoter regulation is 
that it is useful later to icncw the range of regulation 
of the : .osp^ipbd» (See. 8} In particular, one should 
3S determine how nearly the absence of XXrauOE leads to 
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the absence of XPED on the «P surface? a non-leaky 
promoter is preferred, Hon~lea.kinass is useful: a) to 
show that affinity of GP{o,%p~isM) s for AfM(lPBD) is 
d*ss to the osp-tpbd gene, and b) to allow growth of 
S SP(osn~ct:;c} in the absence of XXNDUCS If the expression 
of oski-sM is disadvantageous. The lj.stTV.5 promoter in 
conjunction with the LaclQ repressor is a preferred 
example , 

10 Sec s __jL^2j MA,,.,seguence„de^ign^ 

The present invention is not limited to a single 
method of gene design- The following procedure is an 
example of one method of gene design that fills the 
15 needs of the present invention. 

If the amino-acid sequence of OSP~XS>BD is. a 
definite sequence, then the entire gene will be 
constructed (Sec. If random Dffi is to be fused 

20 to ipbd, then a "display probe « is constructed first? 
the random USA is then inserted to complete the 
population of putative psp-ipba genes (Sec, 6.3) from 
which a functional osp -ipbd gene is identified by in 
Vivo- selection or kindred techniques, 

25 

one may use any genetic engineering method to 
produce the correct gene fusion, so long as one can 
easily and accurately direct mutations to specific 
sites in the pbd DMA subsequence (Sec. 14,1} * For the 

30 methods of mutagenesis considered here, however, the 
:PS& sequence for the os p~ipb3 gene must be different 
from any other DWh in the OCV. The degree and nature 
of difference needed' is determined by the method of 
mutagenesis. One replaces, subsequences coding for the 

3S PBD with vgDNA , then subsequences to be mutagen! see 
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must be bounded by restriction sites that are unique 
within the OCV. If sirifl^stramded-oligsnticleotide- 
directed mutagenesis is to be used, then the DMA 
sequence of the subsequence coding for the must be 

unique within the ocv. 



Regulatory elements include: a) promoters,- b) 
Shine-DalgarnQ sequences,- and c) transcriptional 
terminators , and may be isolated from nature or 
10 designed from knowledge of consensus sequences of 
natural regulatory regions. 



The coding portions of genes to be synthesized- are 
designed at the protein level and then encoded in DHA, 

IS The amino acid sequences are chosen to achieve various 
goals, including; . a) display of a IPBD on the surface 
of a GP, b) change of charge on a IPBS, and e) 
generation of a population of PBBs from which to select 
an SBB. The ambiguity in the genetic cede is exploited 

20 to allow optimal placement of restriction sites and to 
create various distributions of amino acids at. 
variegated eodons, 



23 



A computer program may be used to identify all 
possible ambiguous QWk sequences coding for an amino- 
acid sequence given by the user and to identify places 
where recognition sites for site-specific restriction 
enzymes could be provided without, altering the amino- 
acid sequence. 



Restriction sites are positioned within the osp~ 
ipM gene so that the longest segment between sites is 
3S as snort- as possible. Enzymes the produce cohesive 
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ends are preferred* *he codon preferences of the 
intended host and the secondary structure of the 



5 §.M?..t m of qm& .. mW*Um 

An established strategy for gene synthesis is to 
synthesize both strands of the entire gene in 
overlapping segments of 2 0 to so nucleotides (nts) 

10 (mmSB) - We prefer an alternative method that is more 
suitable for synthesis of vgDHA, our method differs 
from previous methods (QLims, OLXP87, A0SUS7) in that 
wes a) use two synthetic strands, and b) do not out the 
extended DMA in the middle. Our goals are; a) to 

15 produce longer pieces of dsDMA than can be synthesized 
as ssDHA on commercial DHA synthesisers., and h) to 
produce strands complementary to single-stranded VfSM&< 
By using two synthetic strands, we remove the 
requirement for a palindromic sequence at the 3' end, 

30 

UNA synthesisers can produce oligo-nts of up to 
100 nts in reasonable yield, M 0Jm » 100. The 
parameters N w (the length of overlap needed to obtain 
efficient annealing) and H s (the number of spacer bases 
25 needed so that a restriction enzyme can cut near the 
end of blunt-ended dsSHA) are determined by ZMh and 
ensyme chemistry. % « 10 and S s - 5 are reasonable 
values , 



30 We divide the DHA sequence to be synthesized into 

two nearly equal parts, each 5-8 bases longer than half 
the total length, so that there is an overlap between 
the two parts of 10 to 16 bp (Nw) containing no 
yariegated bases, The overlap preferably r is not 

3 5 palindromic and; has high gc content . We synthesize the 
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overlap portion and -the S f extension of each strand. 
Wh«ii these strands are annealed and completed with 
K'lenow ensyme and all four NTPs, we obtain the desired 
sequence as blunt-ended &&tM&.» If the DHA is to be 
5 ligated to other DMA having cohesive ends, five to ten 
(lis) bases are added to that, end* The. synthetic dsDNA 
can then be cut efficiently with an appropriate 
restriction enzyme (DI.-IPS?) , 

10 Because f&xffih. is not rigidly fixed at 100, the 

currertt limits of ISO (~ 2 - H v ) nts overall and 

100 in each fragment- are not rigid, but can he exceeded 
by 5 or 10 nts. Going beyond the Units of xso and ISO 
will lead to lower yields, but these may he acceptable 

13 in certain cases. 

sec, s.2_; IM&-..s^n thesi.s...and„.purlf icati on....mes ^ods : ; 

The present invention is not limited to any 
20 particular method of DEA synthesis or construction. 

In the. preferred embodiment, DMA is synthesized by 
standard means on a Milligram 7500 DHA s 
Miiligen 7500 has seven vials from which 

25 phosphoramidites may be taken. Normally, the first 
four contain A, C, T, and G. The other three vials 
may contain unusual bases such as incsine or mixtures 
of bases, the so-called "dirty bottle"* The standard 
software allovs pr • - ed mixing of two, three, or 

30 four bases in equimolar quantities. 

The present invention is not limited to any 
particular method of purifying DMA for genetic 
engineering. Agarose gel electrophoresis and 
35 electros lut ion on an ISX device (International 
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Biotechnologies, Isc, Ha^«s ; CT) is, preferably, 

used to purify large dsBHA fragments. For oiigo~mts f 
PAGE and ©lectroelution with m Bpigeae device (Epigone 
Corp, , Baltimore, MD) are an alternative to HPLC. 

aec^J.,.!.; Cloni&£^&J^own OSP-ipb^^ene lnto„„OCVi 

In the preferred method, the synthetic gene is 
constructed using plasmids that are transformed into 
bacterial cells by standard methods (M&NX82 , p2S0) or 
slightly modified standard methods. Alternatively, Dffi 
fragments derived from nature are operably linked to 
other fragments of DMA derived from nature or to 
synthetic DMA fragments* In moot cases of the 
preferred method , gene synthesis involves construction 
of a series of plasmids containing larger and larger 
of the complete gene. 



20 i^JSlay„Jrobe; 



If random DHA and phenotypie 
screening are used to obtain a £p(XFS») , then we clone 
random DMA into one of the restriction sites that was 
designed into the display probe. 

The random :DMA may be obtained in a variety of 
ways. Degenerate synthetic DHA is one possibility. 
Alternatively, pseudorandom DHA may be taken from 
nature. If, for example, an Sph X site (GCATG/C) has 
been designed into the display probe at one end of the 
jpbd fragment, then m would use III (CATG/) to 

partially digest DHA that contains a wide variety of 
sequences, generating a wide variety of fragments with 
CAT a 3' overhangs. Preferably, the display probe has 
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different restriction sites at: eaoh end of the ipfod 
gene so that raisdost DH& can he cloned at either eM. 

A plasmid carrying the display probe is digested 
5 with the appropriate restriction ensyme and the 
fragmented, random DMA is annealed and ligated by 
standard methods - The ligated pi asm ids are used to 
transform ceils that are grown and selected for 
expression of the antibiotic-resistance gene, Plasraid- 

10 hearing GPs are then selected for the display~o£~XPBD 
phenotype by the probedhre given in Sec* 15 of the 
present invention using A£M(13*B0} as if it were the 
target. Sec, 15 is designed to isolate OP(PB:D)s that 
bind to a target from a large population that do not 

IS bind, 

iistsL; — £j dax£§pJn_J3t_j;^£s. — L 

Cells are transformed with _ ligated OCVs and 
SO selected for uptake of OCV after an appropriate 
incubation with an agent appropriate to the selectable 
markers on the OCV* GPs are harvested by methods 
appropriate to the OF at hand, generally f 
tXo-^e OPs and rssuspeusion of the 
25 pellets in sterile medium (cells) or .buffer (spores or 
phage) . 

30 The harvested packages are now tested for display 

of XPBD on the surf ace? any ions or oof actors hncwn to 
be essential for the stability of XPBD or '&£M(XPBB} 
Bast -be included at appropriate levels. The tests can 
be done: a) by affinity labeling, b) ensymatically> o) 

35 } y , a) h, finite or 
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e) by affinity precipitation, Th« fM IPBD ~ N tl 
step is one picked to have strong affinity 
{preferably, % < 10~ 1X M) for the IPBD molecule ana 
little or no affinity for the wtSF. For example , if 
3 BF'f I were the XP8D f trypsin, anhydrotrypsin, or 
antibodies to BPT2 could be used as the MM ( BF1X } to 
test for the. presence of BFTI < Anhy drot ryp& in , a 
trypsin derivative with serine 193 converted to 
dshydroaianine,- has no proteolytic activity but retains 
20 its affinity for SPTX {AKQH72 and H0BE7?) , 

Preferably, the presence of the IPBD on the 
surface of the GP is demonstrated through the use of a 
soluble, labeled derivative of a AfM(IPBD) with high 
15 affinity for IPBD , 'fhe labeled derivative; of AfHpPBB) 
is denoted as AfM(XPBB)*. 



If random Dm. has been used, then the pr 
of sec, IS are used to obtain a clonal isolate that has 
the display --of-iPBD phenotype. Alternatively, clonal 
isolates may be screened for the display-of-IPED 
phanotype. The tests of this step are applied to one 
or more of these clonal isolates. 

If no isolates that bind to the affinity molecule 
are obtained we take corrective action as disclosed in 
Sec, 9. 

If one or more of the teste indicates that the 
IPBD is displayed on the GP surface,, we verify that the 
binding of molecules having known affinity for IPBD is 
due to the chimeric osp-ipbd gene through the use of 
standard genetic and biochemical techniques, such as; 



WO 90/02809 



CT71J38 



1) transferring the. os p-ishd gene into the parent 
SP to ' verify that osp-ichd confers binding, 

2) deleting the osp-ipbd gene from the isolated m 
to verify that loss at- qsp-dft bd causes loss of 
binding, 

3} showing that, binding of SPs to AfM(IPBD) 
correlates with [XIHDtJCE] (in -those cases that 
expression of oso~ lobd is controlled hj 
[XXHDUCSi } , and 

4} showing that binding of GFs to AfMfXPSD) is 
specific to the mobilised AfH<l£By) and not to. 
the support matrix. 



Presence of IPBD on the <$B surface is indicated by 
a strong correlation between [XXTOCE] and the 

M reactions that are linear in the mount of IPBD (such 
as J a) binding of <S3?s by soluble AfH(XPBD) *, b) 
absorption caused by IPBD f and c) biochemical reactions 
of IPBD) , The demonstration (4) that binding is to 
&fM(lPBD) and the genetic tests (i) and <2) are 

25 important; the test with. ximmcB <3) is lees so, 

we sequence the relevant ipM. gene fragment from 
each of several clonal isolates to determine the 
construction, 

30 

We establish the maximum salt concentration and pH 
range for which the GP { IPBD) binds the chosen 
AfM(IPBD) » 
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It the XPBB is allayed on the outside of the GP, 
and if that display is clearly caused by the introduced 
osm-lobd gen®, we proceed to Part IX, otherwise we must 
analyze the result and adopt appropriate corrective 



2m...t^...Mml^^M&mL 



It we have attempted to fuse an jjgfed fragment to a 
natural stm fragment, onr options are : 

1) pick a different fusion to the same by 

a) using opposite end of os p > 

b) keeping more or fewer residues from osd in 
the fusion? for example,- in increments of 3 



c) trying- a known or predicted domain 
boundary f 

d) trying a predicted loop or turn position, 
2) pick a different sm> or 



3) switch to random OHA method. 

25 If we have just tried the random DNA method 

unsuccessfully, cur options are : 

1) choose a different relationship between imbd 
fragment and random DMA (ipM first,, random m& 
30 second or vies, versa) t 



2) try a different degree of partial digestion, a 
different ensyme for partial digestion, a 
different degree of shearing or a different source 
35 of natural DNS., or 
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3} switch to the natural OSP method . 

If all reasonable OSPs of the current GF have been 
5 tried and the random DMA method has been tried, both 
without success, we pick a new GP, 

Fart II 

In part II we optimise an affinity separation 
system that will be used in Fart III to enrich a 
population of GP{vgPBD)s for those Gp(PBD)s that 
15 display PBDs with increased affinity for the target. 

Affinity chromatography is the preferred means, 
but PAC3, electrophoresis ; or other means may also tee 
used* 

20 

10, It, Optimisation .of -M.llnjty Chromatography 

Changes in eluant ccnesntr a tion cause GP$ to elute 
25 tram the column. Elation volume, however, is more 
easily measured and specified. It is to be understood 
that, the eluant concentration is the agent causing GP 
release and that an eluant concentration can be 
calculated from an elation volume and the specified 
30 gradient . 

Using a specified eiution regime,, we compare the 
eiution volumes of GP{XPBD}s with the eiution volumes 
of wtGP on affinity columns supporting AfM(IPBD) , 
3.5 Comparisons are made at various: a) .amounts of IPBD/GP, 
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b) densities of &fM{IPBD)/ (volume of matrix) (DoAMoM) , 

c) Initial ionic strengths, d > elation rates, e) 
amounts of GP/ (volmae of support} , t) pHs, and g) 
temperatures, became these are the parameters most 

S likely to affect the sensitivity ami efficiency of the 
separation. We th»» pick those conditions giving the 
best separation. 

We do mt optimise pH or temperature? rather we 
10 record optimal values for the otter parameters for one 
or more values of pH and temperature. The conditions 
of intended use,, specified by the user {Sec, 11) , may 
include a specification of pH or temperature. If pH is 
specified., then pH will not be varied in ©luting the 
15 column {Sec. 13,3}., Decreasing pH may foe used to 
liberate bound GPs from the matrix. If the intended 
use specifies a tempera tare , we will hold the affinity 
column at the specified temperature during eiution, but 
we might vary the temperature during recovery, 

20 

The APM (XPBD) is preferably one know to have 
moderate affinity for the IPBD (K d in the range 1<T« If 
to 10~ 8 M) » When populations of G?(vgPBD)s are 
fractionated, there will be roughly three 

25 smbpopulations: a) those with no binding, b) those that 
have some binding but cat* be washed off with high salt 
or low pH, and c) those that bind very tightly and must 
foe rescued in situ, We optimize the parameters to 
separate (a) from (is) rather than (b) from (c) . bet 

30 PBD^ be a PBD having weak binding to the target and 
PBD S be a PBD having strong binding. Higher DoAMoM 
might, 'for example, favor retention of GP(?BD W ) but 
also make it very difficult to eiute viable GP(FBD S ) . 
We will optimize the affinity separation to retain 

35 GP(PBD W ) rather than to allow release of GP(PBD S ) 
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because a tightly bound <SFfPBD s ) can fee rescued by 1b 
situ growth.. If we find that Do&MoM strongly affects 
the el.uti.on volutae, then in part XXX ws say reduce the 
amount of target on the affinity column when an SBD has 
5 been found with moderately strong affinity (K'd on tfee 
order of 10~ ? M) for the target. 

In this step f we measure slut ion vciemes of 
genetically pure SPs that elute from the affinity 
10 matrix as sharp hands that can be detected toy uv 
absorption. Samples from effluent fractions are plated 
on suitable medium (cells or spores) or on sensitive 
calls (phage) and colonies or plaques counted «■ 

15 several values of XPBD/<3P, Do&MoM, elufcion rates, 

initial ionic strengths, and loadings should be 
examined. We anticipate that optimal values of IPBD/GP 
and boAKoM will be correlated and therefore should he 
optimised together. The effects of initial ionic 

20 strength, elution rate, and amount of (matrix 
volume) are unlikely to foe strongly correlated., and so 
they can be optimised independently. 

For each set of parameters to b® tasted, the 
25 column is eluted in a specified manner-. For example, 
we may use a regime called llution Regime l? a KC1 
gradient runs from XOmSS to maximum allowed for the 
SF(XP8I>) viability in 100 fractions of 0.05 V v (void 
volume) , followed by 20 fractions of 0 « OS at maximum 
30 allowed KC1? pH of the buffer is maintained at the 
specified value with a convenient buffer such as Trie, 
It is important that tee conditions of this 
optimization be similar to the conditions that are used 
in Bart XXX for selection for binding to target (Sec, 
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15,3} ana rscovsry of mm from the. chromatographic 
system (Sac. 15.4), 

msm the osfi^iebd gene is regulated by [XINDUCE] , 
5 IPBD/GP can be controlled by varying [XINDUCB] - 
Appropriate values of { X1HDUCE ] depend on the identity 
of tXXnwc®} and the promoter? if, for example, XXHDUCE 
is isapropyXthiogalactoside (IPTG) and the promoter is 
lacTJVS , then [XJT8] * 0, .0*1 UH, 1.0 uM, 10,0 UM, 100,0 
10 tsM, and 1,0 tpH are appropriate leva Is to test. The. 
range of variation of [XXMD0CB] is extended until an 
optimum is found or an aeceptahie level of expression 
is obtained, 

15 DoAMoM is varied from the maximum that the matrix, 

material can bind to 1% or 0,1% of this level in 
appropriate steps. We anticipate that the efficiency 
of separation will he a smooth function of DoMIoM so 
t£afc it is appropriate to cover a wide range of values 

20 for poAMoK with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid, 

Several values of initial ionic strength are 
25 tested, such as 1,0 m, 5,0 xM, 10.0 m and 30,0 m. 

The elation rate is varied, by successive factors 
of 1/2, from the maximum attainable rate to 1/16 of 
this value. The fastest elation rate giving the good 
30 separation is optimal. 

The goal of the optimisation is to obtain a sharp 
transition between bound and unbound £Ps, triggered by 
increasing salt or decreasing pH or a combination of 
35 both. This optimisation need be performed only; a) for 
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each temperature to be used, b} for each pK to be used, 
and c) when a new SjP(IPBD) is created. 

Begul stable promoters are available for all 
5 genetic packages; except, possibly, bacterial spores. A 
promoter functional in bacterial spores might be 
prepared by constructing a hybrid of a sporulation 
promoter- and a reguiatable bacterial promoter (e.g., 
lap) , or by saturation, mutagenesis of a spornlation 

10 promoter followed by screening for reguiatable promoter 
activity (cf. QtdPSS, OX.XP8?) . When the promoter of 
the gsp- i pbd gene is not reguiatable, we optimise 
Do&Mo'M, the elation rate, and the amount of GP/voiume 
of matrix. If the optimised affinity separation is not 

15 acceptable, we must develop a means to alter the amount 
of XPBD per CP . 

&&* 10.2* Keasipins, the §§MitMa~-C^M^ 

ao 

We determine the sensitivity of the affinity 
separation (C sens i) by measuring the minimum quantity 
of SP(IPBD) that can be detected in the presence of a 

25 large excess of wtGP. The user chooses a number of 
separation cycles, denoted ^ rom , that will be 
performed before an enrichment is abandoned; 
preferably, H c & rom is in the range 6 to 10 and H c >~ rom 
must be greater than 4. Enrichment can he terminated 

30 by isolation of a desired GP(SBD) before H c ^ rom passes. 

She measurement of sensitivity is significantly 
expedited if GF{XPBD) and wtSP carry different 
selectable markers . 

33 
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Mixtures of GP(IPED) aM ttSP are prepared in the 
ratios of l?v lim , where ranges by an appropriate 

factor (e.g.., 1/10} aver an appropriate range, typically 
10 13 * tUrough. 10 4 . Large values of v xilB are rested 
5 first; once a positive result is obtained for one value 
of l ? llTaf no smaller values of need be tested. 

Each mixture is applied to a column supporting, at the 
optimal D0AM0H, an MM (XPBD) having high affinity for 
XPBD and the column is eluted by the specified eiution 

10 regime. The last fraction that contains viable GPs and 
an inoculum of the column matrix material are cultured* 
If GP(IPBD) ansa wtGP have different selectable markers, 
then transfer onto selection plates identifies each 
colony. Otherwise, a number 32} of QP clonal 

15 isolates are tested for presence of IPSO by the 
techniques discussed in Sec, 8. 

If XPBD is not detected on the surface of any of 
the isolated CPs, then CPs are pooled from: a} the last 

20 £mr fa,g. 3 to 5) fractions that contain viable GPs 
and b) an inoculum taken from the column matrix. The 
pooled GPs are cultured and passed over the same column 
and enriched for GP {XPBD) in the manner described. 
This process is repeated until ffphroat passes have bean 

as performed, or until the XPBD has been detected on the 
GPs, If CP < XPBD) is not detected after N ctoom passes,, 
y lim is decreased and the process is repeated. 

Csansi equals the highest value of fxim which 
30 the user can recover DP ( XPBD) within ®chxo® !»»»•»« 
The number of chromatographic cycles (K cyc ) that vera 
needed to isolate GP(IPBD) gives a rough estimate of 
C eff t C eff is approximately the K cyc th root of Vlimt 



C 0 ff - C appro* , } exp{ iog ;2 {V lim } /K cyc ■ 
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For example, if \?xtm were 4.0 x XQ y and three 
separation cycles were needed to isolate. GP(IPSD) , then 
c eff 555 (approx, ) 736. 

S 

To determine C m ff more accurately,, we determine 
th© ratio of SP{XPBQ)/WtGP loaded onto an AfM(IPBD) 
1,0 column that yields approximately equal amounts of 
6F(XFBD} and wtGP after elution, 

jfe-JJldU Other Separation Means 

IS Other separation means are optimised in a manner 

parallel to the ^sed for affinity chromatography, 

FACS (e.g. FACStar from Seekton-Dichlnson, 
mountain View, Ck) is most appropriate for bacterial 

20 cells and spores because the sensitivity of the 
machines requires approximately 10 00 molecules of 
fluorescent label bound to each GP to accomplish a 
separation. To optimise WhCS separation of we. use 

a derivative of Afm(XFBD) that is labeled with a 

25 fluorescent molecule, denoted Mm (XPBD}*« The 
variables that must fee optimised include.; a) amount of 
XpBB/GP, fe) concentration of &£m<XPBD)*, c) ionic 
strength, d) concentration of 'GPs, and e) parameters 
pertaining to operation of the FACS machine. Because 

30 Afm(XPBD}* and SPs interact in solution, the binding 
will fee linear in both ( Af m { 1 PBD) * ] and [displayed 
XPBDJ. Preferably, these two parameters are varied 
together-, The other parameters can fee optimized 
led ependentiy * 

35 
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Electrophoresis is most appropriate to 
bacteriophage because of their small size (Smmi) . 
Electrophoresis is a preferred separation means if the 
target is so email that chemically attaching it to a 
5 column or to a fluorescent label would essentially 
change the entire target, For example, chioroacetate 
ions contain only seven atoms and would be essentially 
altered by any linkage, GPs that hind chioroacetate 
would become more negatively charged than gps that do 
10 not bind the ion and so these classes of GFs could be 
separated » 

fhe parameters to optimise for electrophoresis 
includes a) XFBD/GF, b) concentration of gel material , 

15 agarose, e) concentration of Afm (IPB.D) , d) ionic 

strength, e) ®iz&, shape, and cooling capacity of the 
electrophoresis apparatus, f) voltages and currents., 
and f) concentration of SJPs. Preferably, IPBD/GP and 
CArm(IPBD) 1 are varied at the same time ana other 

20 parameters are optimized independently . 

Part ill 

gee. ll.Ot Choice of target,,,%jj^£i&jLX 

25 

Any material way he chosen as target material, 
subject only to the following restrictions: 

If affinity chromatography is to be used, then: 

30 

1) the molecules of the target material must he of 
sufficient size and chemical reactivity to be 
applied to a solid support suitable for affinity 
separation, 

35 
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2} after application to a matrix, the. target 
material must not react with water, 



3) after application to a matrix, the target 
5 material must not bind or degrade proteins in a 

non-specific vay, and 

4} the molecules of the target material must foe 
efficiently large that attaching the material to 
10 a matrix allows enough unaltered surface area 

(generally at least 500 l 2 f excluding the atom 
that is connected to the linker) for protein 
binding. 



15 If FACS is to fee used as the affinity separation 

means, then; 

lj the molecules of the target material must toe of 
sufficient slate and chemical reactivity to toe 
m conjugated to a suitable fluorescent dye or the 

target must itself toe fluorescent. 



2) after any necessary fluorescent labeling , the 
target must not react with water, 

3} after any necessary fluorescent labeling, the 
target material must not toind or degrade proteins 
in a non-specific way, and 



30 4) the molecules of the target material must be 

sufficiently large that attaching the material to 
a suitable dye allows enough unaltered surface 
area (generally at least 500 A 2 ,, excluding the 
atom that is connected to the linker) for protein 

33 binding. 
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If affinity electrophoresis is to be used, than; 

X) the target mast «£ther be charged or of such a 
nature that its binding to a protein will change 
the charge of the protein, 

2} the target material must not react with water, 

3} the target material must not bind or degrade 
proteins in a non-specific way, and 

4) the target must be compatible with a suitable 
gel material » 

Possible target materials include, but are not 
limited to: a) soluble proteins (such as horse heart 
myoglobin, human neutrophil elastase, activated (blood 
clotting) factor X, alpha*- f etoprotein f alpha 

Interferon, roelittin, gojoilatel^ G^rtussis adenylate 

cyclase toxin, any retroviral pel protease or any 
retroviral s®& protease) f b) lipoproteins (such as 
human low density lipoprotein) , c) glycoproteins (such 
as a monoclonal antibody) , d) lipopolysaeeharidas (such 

as O-antigen of Sajjammlla epte^ItMifl) , ®) nucleic 

acids (such as tRH&s , ribosomal RHAs, messenger RWAs 
mm& or ss»I, possibly with sequence specificity) ; f) 
soluble organic molecules (such as cholesterol , 
aspartame , bilirubin, morphine, codeine, 
dicfalorodiphenyltr ichiorethane ( DDT) , bensso ( a } pyrene , 
prostaglandin PGB2, protoporphyrin IX, or actlnomycln 
D) , g) organometallic complexes (such as iron haem or 
cofoolt haem) , a) organic polymers (such as cellulose or 
chitin) , i) insoluble minerals (such as asbestos, 
seolitee, or hydroxy lapatite) , 1) viral and phage coat 
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proteins (such as infiuensa haestaggutinin or phage 
lambda caps id) , and k.) bacterial sssbrans or outer 
membrane proteins? (such as LamB from lk.„.„0J3Xi or 
f la.gel.la proteins) . 

S 

A supply of several nilligrams of pure target 
material is desired. Impure target material could bo 
used, but one might obtain a protein that binds to a 
contaminant instead of to the target* 

10 

T&e following information about the target 
material is highly desirable: 

1) stability as a function of temperature, pH, and 
15 ionic strength, 

2) stability with respect to eaaotropes such as 
urea or guanidinium cl, 

20 3) pi, 

4} molecular weight, 

5) requirements for prosthetic groups or ions, 
25 such as haem or Ca +2 , and 

6) proteolytic activity, if any. 

In addition to this most desirable information, it 
30 is useful to kaov: 1} the target's seguenee, if the 
target is a ma eromoleeul e , 2} the 3D structure ox the 
target, 3) ensymatic activity, if any, and 4) toxicity, 
if any. 
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T&e user of the present Invention specifies 
certain parameters of the intended use of the binding 
protein: 

5 1) the acceptable temperature range, 

2} the acceptable pU range, 

3} the acceptable concentrations of ions and 
10 neutral solutes, 

4} the maximum acceptable dissociation constant 
for the target end the SB.D; 

% * [Target] [SBBj/ [Target ;SBD) 

in some cases, the user may require, discrimination 
between T, the target, ana M, some non-target, bet 

20 

% * [T] [SB03/ [TmBDl , and 
% - [HHS8D]/[NJSBD) , 

then %/% ~ {rTj[S : SBDj)/(CH]lT S SBDj}. 

2S 

The user then specifies a maximum acceptable value for 
the ratio %/%. 

If the target material is a general protease, one 
30 must consider the following points: 

1} a highly specific protease can be treated like 
any other target, 

35 2} a general protease, such as subtilisin, may 

degrade the OSPs of the GP including osp-PBDs; 



so 



there ara several alternative ways of dealing with 
general proteases , including: a) a chemical 
inhibitor may be used to prevent, proteolysis (e.g. 
phanyltaetbylfluorosulfate (PHFS) that inhabits 
serine proteases) t b) one or store active-site 
residues may be mutated to create an inactive 
protein a serine protease in w&icfc the 

active serine is stated to alanine) , or c) one or 
more active-site amino-aoids of the protein may he 
chemically modified to destroy the catalytic 
activity (e.g..?., a serine protease in which the 
active serine is converted to anhydrossrine) > 

3) SEOs selected for binding to a protease need 
not be inhibitors? SBDs that happen to inhibit 
the protease target are a fairly small subset of 
SBDs that bind to the protease target f 

4) the more we modify tbc target protease, the 
less like we are to obtain an SBD that inhibits 
the target protease,, and 

5) if the user requires that the SBD inhibit the 
target protease , then the active' site of the 
target protease must not be modified any more than 
necessary? inactivation by mutation or chemical 
modification are preferred methods of inactivation 
and a protein protease inhibitor becomes a prime 
candidate for XFBD* For example, BPTI could foe 
mutated, by the methods of the present invention, 
to bind to proteases other than trypsin (TME7? 
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The- user must pick a GP(XPBD) that is suitable to 
the chosen target according to the criteria of Sec. 2, 
5 It is anticipated that a small collection of a 
' GF(XFBD) s can fee assembled suet that, for any chosen 
target.., at least one member of the collection will be a 
suitable starting point for engineering a protein that 
binds to the chosen target by the methods of the 
10 present invention. $he user should optimize the 
affinity separation for conditions appropriate to the 
intended use by the methods described in Part Oh 

gee. mmim^ ^^ l...lmAly. of PSDs^Pela,;:^d 

15 to PFBD- to Be generated 

Sec,, , 13,1: ^m^m^mj^m^..oR..J^ (or^tMsL^iqi, 
toumryi 

20 We choose residues in the XPBD to vary through 

consideration of several factors,, including s a) the 3D 
structure of the XPBD, fa) sequences homologous to IPSO, 
and c) modeling of the XFBD and mutants of the XPBD. 
Because the number of residues that could strongly 

25 influence binding is always greater than the number 
that can be varied simultaneously, the user must pick a 
subset of those residues to vary at one time. The user 
must also pick trial levels of variegation and 
calculate the abundances of various sequences. The 

30 list of varied residues and the level of variegation at 
each varied residue are adjusted until the composite 
variegation is commensurate with C sens | and M ntv , 

A key concept is that only structured proteins 
35 exhibit specific binding, i«e>. can bind to a particular 



worn/mm 



chemical entity to the exclusion of .most, others. Thus 
the residues to be varied are chosen with an eye to 
preserving the. underlying IFBD structure. 
Substitutions that prevent the 3?BD from folding will 
5 cause GFs carrying those genes to bind indiscriminately 
so that they can easily be removed from the population. 

Burial of hydrophobic surfaces so that bulk water 
is excluded is one of the strongest forces driving the 

.10 binding of proteins to other molecules , Bulk water can 
be excluded from the region between two molecules only 
if the surfaces are complementary* we must test as 
many surfaces as possible to find one that is 
complementary to the target. The selection-through- 

IS binding isolates those proteins that are more nearly 
complementary to some surface on the target- The 
effective diversity of a variegated population is 
measured by the number .of different surfaces-., rather 
than the number of protein sequences. 'fhus 'w should 

20 maximize the number of surfaces generated in cur 
population, rather than the number of protein 

In hypothetical example 1, we consider a 
25 hypothetical PBD, shown in Figure 3 binding to a 
hypothetical target. figure 3 is a 2X> schematic of 3D 
objects? by hypothesis, residues i, 2, 4,- 7, 13, 14, 
15, 20, 21, 22, 27, 29, 31, 33, 34, 36, 37, 38, and 39 
of the IFBD are on the 3D surface of the XPBD, even 
30 though shown well inside the circle, Proteins do not 
have distinct, countable faces. Therefore we define an 
"interaction set* to be a set of residues such that all 
members of the set can simultaneously touch one 
molecule of the target material -without any atom of the 
35 target coming closer than van der Weals distance to any 
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main-chain atom of the J3tB&. The concept of a residue 
«tcucfcing« a molecule of the target is discussed below. 
Ob© hypothetical interaction set, Set A f in Hgu£§_j[ 
comprises residues 6, 7, 20, 21, 22, 33, and 34, 
5 represented by squares. Another hypothetical 
interaction set. Set B, comprises residues 1, 2, 4, 6, 
31, 37, and 39, represented by circles. 

If we vary one residue, number 21 for example, 
10 through all twenty amino acids, we obtain 20 protein 
sequences and 20 different surfaces for interaction set 
3U Hote that residue & is in two interaction sets and 
variation of residue 6 through all 30 amine acids 
yields 20 versions of interaction set & and 20 versions 
IS of interaction set S, 

How consider varying two residues, each through 
ail twenty amino acids, generating 400 protein 
sequences. If the two residues varied were, for 

20 example, number X and number 21, then there would be 
only 40 different surfaces because interaction set A 
does not depend on residue 1 and interaction set S does 
not depend on residue 21, If the two residues varied, 
however, were number 7 and number 21, then 400 surfaces 

25 would be generated. 

If Sf spatially separated residues are varied at 
one time, 20 x H surfaces are generated. Variation of 
N residues in the same interaction set yields 20 N 

30 surfaces. For example, if N «* 7, variation of 
separated residues yields 140 surfaces while variation 
of interacting residues yields 20 7 «* 1.28 x 10 s 
surfaces. Thus, to ssaaciMze the number of surfaces 
generated when H residues are varied, all residues 

35 should be in the same interaction set. 
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The amount; of surface area buried in strong 
protein-protein interactions ranges from 1000 .ft 2 to 
2000 i a (scKU/9, piosff), individual amino acids have 
5 total surface areas: that, depend mostly on type of amino 
acid and weakly on conformation. Those areas range 
from about 180 £ 2 for glycine to about 360 it 2 for 
tryptophan. Proa assino~acid solvent exposures of 
published protein structures, , ^ :o':u:a.;a that i>" 

10 on a protein surface comprises between 4 and 30 amino- 
acid residues. Varied amino acid sequences, as found 
in actual proteins, involve between 10 and 25 residues 
in forming 1000 I 2 of protein surface „ Sqhuls and 
Schirmer estimate that 100 f 2 of protein surface can 

IS exhibit as many as lOOO different specific patterns 
(SCHU7S f plQS) * The number of surface patterns rises 
exponentially with the area that can be varied 
independently, One of the BPTX structures recorded in 
the Brook&aven Protein Data Bank (OPT!) t for example, 

20 has a total exposed surface area of 3997 H 2 (using the 
method of Lee and Richards (LEKB71) and a solvent 
radius of 1.4 1 and atomic radii as shown in Table ?) . 
If we could vary this surface freely and if 100 1$ can 
produce 1000 patterns , we could construct 10 120 

2 5 different patterns by varying the surface of BFTX! 
This calculation is intended only to suggest the huge 
number of possible surface patterns based on a common 
protein backbone. 

30 One protein framework cannot , however, display ail 

possible patterns over any one particular 100 f 2 of 
surface merely by replacement of the side groups of 
surface residues. The protein backbone holds the 
varied side < constas 

35 so that the variations are not independent. We can, 
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nevertheless, generate a vast collection of different 
protein surfaces by varying these protein residues that 
face the outside of the protein. 

S Examination of a model of BPTI in contact with 

myoglobin shows that rasiau*s 3, 7, 8, W t 13, 30, 41, 
and 42 can all simultaneously contact a molecule the 
size- and shape of myoglobin. Residue 49 cannot touch a 
single myoglobin molecule simultaneously with any of 
10 the first set even though ail are on the surface of 
BPTI » It is not the intent of the present invention, 
however, to use models to determine which part of the 
target molecule will actually be the site of binding by 
a PBD« 

IS 

For cassette mutagenesis, the protein residues to 
be varied are, preferably,, close enough in sequence 
that the variegated Wtk (vgDNA) encoding all of them 
can be made in one piece . The present invention is not 
20 limited to a particular length of vgDNA that can be 
synthesized. With current technology, a stretch of 60 
amino acids (180 DMA basse) can he spanned. 

One can use other mutational means, such as 
as singie-stranded^cligonucleotide-dirsoted mutagenesis 
(BOTS85) using two or more mutating primers to mutate- 
widely separated residues. 

Alternatively, to vary residues separated by more 
30 than sixty residues, two cassettes may he mutated. A 
first cassette is mutagenised to produce a population 
having, for example, up to 30,000 members. Using 
variegated OCV, we mutagenic a second cassette to 
produce a second variegated population having the 
35 desired diversity. 
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The composite level of variation must. not. exceed 
the prevailing c s tc reduce vet I mo 

numbsrs of i r: ' ."1/ transformed coils or b) detect 
S small components in a highly varied population. The 
limits on th« level of variegation are discussed in 
Sec, 13. a* 



We assemble the data about the XPBD and the ten • 
10 that are useful in deciding which residues to vary 1} 
3D structure, or at least a list of residues on the 
surface of the XPBD, 2) list of sequences homologous to 
XPBD, and 3} model of the target molecule or a stand-in 
for the target. 

15 

These data and an understanding of the behavior of 
different amino acids in proteins will be used to 

85 



1) which residues of the XPBD are on the outside 
and close enough together in apace to touch the 



2} which residues of the XPBD can be varied v/ith 
high probability of retaining' the underlying XPBD 
structure? 



although an atomic model of the target material 
from .X-ray crystallography, HM.S , etc. is preferred in 

30 such examination, it is not necessary. For example, if 
the target were a protein of unknown 3D structure, it 
voulo: be sufficient to know the molecular weight of the 
protein and whether it were a soluble globular protein, 
a fibrous protein, or a membrane protein. One can then 

35. choose a protein of known structure of the same class 
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and similar size and shape to use as a molecular stand- 
in and yardstick. At low resolution, all proteins of a 
given sis© and class look: much the same. The specific 
volumes are the mm., all are more or leas spherical 
S and therefore ail proteins of the same size and class 
have about the same radius of curvature. The radii of 
curvature of the two molecules determine haw much of 
the two molecules can coma into contact* 

10 The most appropriate method of picking the 

residues of the protein chain at which the amino acids 
should he varied is .by viewing, with interactive 
computer graphics, a model of the IPBD. A stick- figure 
representation of molecules is preferred, A suitable 

15 set of hardware is an Evans & Sutherland PS 3 SO graphics 
terminal (Evans & Sutherland Corporation, Salt Lake 
City, TJT) and a MicroVAX XI supermicro computer 
(Digital Equipment Corp., Maynard, »]. suitatoie 
programs for viewing and manipulating protein models 

20 include: a) PS-FR0DO, written toy T» A. Jones {jqness) 
and distributed toy the Biochemistry Department of Rice 
University, Houston, TKt and b) PROTEUS, developed toy 
Bayringer, Tramantano, and Fl otter i ok (DAfR86) . 

25 Theoretical calculations, such as dynamic 

simulations of proteins, are used to estimate the 
effect of substitution at a particular residue of a 
particular amino-acid type on the 3D structure of the 
parent protein. Such calculations might also indicate 

30 whether a particular substitution will greatly affect 
the flexibility of the protein. 



Sec., !3„q„q„: Sfaft .prj^jp^X set: 
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Using the knowledge of which residues are on the 
surface of the XPBD, ve pick; residues that are close 
enough together on the surface of the IBBD to touch a 
molecule of the target simultaneously without having 
5 any XPBD jaain-chain atom come closer than van der Waals 
distance ( viz.. 4,0 to 5.0 A) from any target atcaa. A 
residue of the XPBD "touches** the target if; a} a main- 
chain atom is within van dor Waals distance,, via. 4,0 
to 5,0 f of any atom of the target .molecule, or b} the 

10 ^beta ^ s within ® OVi ta£f of any atom of the target 
molecule so that a side-group atom could mats contact 
with that atom. Because side groups differ in siso 
'( ■®t. f , Table 35} f some judgment is required in picking 
^cutoff* In the preferred embodiment; we will use 

.15 Ocatoff ~ 8 <° ^ tout other values in the range 6,0 t to 
10.0 I could he used. If XPBD has 0 at a residue, we 
construct a pseudo C^ta wi th the correct bond distance 
and angles and judge the ability of the residue to 
touch the target from this pseudo C^^. 

20 

Alternatively,, we choose a set of residues on the 
surface of the XPBD such that the curvature of the 
surface defined by the residues in the set is not so 
great that it would prevent contact between all 
2 5 res.id.ues in the set and a molecule of the target. This 
method is appropriate if the target is a ma.crcmoleeule , 
such as a protein, because the »BDs derived from the 
XPBD will contact only a part of the macromolecular 
surface * 

30 

We prefer that there he some indication that the 
underlying XPBD structure will tolerate substitutions 
at each residue in the principal set of residues, 
indications could come from various sources, including: 



WO 90/02809 



FC17US89/0373I 



m 

a) homologous sequences,, b) static computer modeling, 
or c) dynamic computer simulations .» 

The residues in the principal set need not be 
5 contiguous in the protein sequence. We require only 
that the amino acids in the residues to foe varied all 
foe capable, of touching a molecule of the target 
material simultaneously without having atoms overlap. 
If the target, were, for example, horse heart myoglobin, 
10 and if the XPBD were 8PTI, any set of residues in one 
interaction set of BF1T defined in Tahls 34 could foe 
picked, 

Preferably, the principal set contains eight to 
15 sixteen residues. f lhis number of residues allows 
sufficient variability that a surface that is 
complementary to the target can be found, out is small 
enough that a significant fraction of the surface dan 
be varied at one time. 

20 

gee, 13.1.2; £ he second a,ry..-.^ti 

The secondary set. comprises residues that touch 
residues in the primary set, and are excluded from the 
25 primary set because the residue; a) is internal, b) is 
highly conserved, or c) is on the surface, but the 
curvature of the XPBD surface prevents the residue from 
being in contact with the target at the same time as 
one or more residues in the primary set. 

30 

Internal residues, although frequently conserved 
and xaay tolerate some conservative changes such as I to 
L or F to Y. These changes affect the detail placement 
and dynamics of adjacent protein residues ana such 
35 variation may he useful once an SBD is found. 
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Surface residues in the secondary set are most 
of tan located on t&fe periphery of the principal set, 
which do not make direct contact with the target 
S sim t ; y with al.3 other residues of the principal 

s©t. Tim charge on the amino acid in one of th ; ■ 
residues could, however, have a strong effect on 
binding » It is appropriate to vary the charge of some 
or all of these residues to improve an SBD. For 
10 « sola, the variegated eodon containing eguimoiar A 
and a at base 1, ©g^iimolar C and & at has© 2, and A at 
base 3 yields amino acids T, A, K f and E with egu&l 
probability, 

15 Sec „ " * 1 

The allowed level of variegation that assures 
progressively determines how many residues can fee 
varied at once? geometry determines which ones, 

20 

The user picks residues to vary in many ways? the 
following is a preferred manner. Pairs of residues are 
picked that are diametrically opposed across the face 
of the principal set. Two such pairs are used to 

25 delimit the surface , up/down and right/ left, 
alternatively f three residues that form an inscribed 
triangle f having as large an area as possible,, on the 
surface are picked. One to three other residues are 
picked in a checkerboard fashion across the interaction 

30 surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because 
all the intervening residues must have acceptable 
complementarity before favorable interactions can occur 
at widely-separated residues. 

35 
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The masher of residues picked is coupled to the 
range through which each can he varied by the 
restrictions discussed in Sec. 13. 2* In the first 
round f we do not assume any binding between XPBD and 
5 the target and so progressivitY is not an issue. At 
the first round, the user may elect to produce a level 
of variegation such that each molecule of vgBHA is 
potentially different through, for example, unlimited 
variegation of 10 cotes <20 10 approx. ~ 1'0 13 ) . One 

10 run of the UNA synthesizer produces approximately 1Q 1S 
molecules of length 100 nts.. Inefficiencies in 
ligation and transformation will reduce the number of 
proteins actually tested to between 10 7 and 5 x 10 s , 
Multiple iterations of the process with such very high 

15 levels of variegation will not yield rep&atable 
results? the user must decide whether this is 
important. ♦ 

gee, 13.2; Range of va riation at ^la^l L Si te of 

2 0 Mutation; 

met total level of variegation is the product of 
the number of variants at each varied residue. Each 
varied residue can have a different scheme of 
25 variegation,, producing 2 to 20 different possibilities. 
We require that the process be progressive, i^ju each 
variegation cycle produces a better starting point for 
the next variegation cycle than the previous cycle 
produced. 

30 

N„ B» t Setting the level of variegation such 
that the ppfod and many sequences related to 
the pnr>"; sequence are. present in detectable 
amounts insures that the process is 
35 progressive, If the level of variegation is 
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so high that the % snce is present at 

such low levels that there is an appreciable 
chance that no trans formant will display the 
PPBD, then the best SBD of the next round 
5 couM be HSSI than the ppbd. At excessively 

high level of variegation, each round of 
mutagenesis is independent of previous rounds 
and there is no assurance of progressivity . 
This approach can lead to valuable binding 
10 proteins, but repetition of. experiments with 

this level of variegation will not yield 
progressive results. Excessive variation is 
not pre terrau . 



15 If the level of variegation is such that the 

parental sequence and each single, amino-aeid change is 
present for selection,- then ve know that a selected 
sequence is closet to optimal or the eame as the 
parent, if, on the other hand, very high levels of 

20 variegation are txsed, a sequence may be selected t not 
because it is superior to the parental sequence, but 
because the parental and improved sequences are, by 
chance, absent. 



25 Progress ivity is not an all-or-nothing property. 

So long as most of the information obtained from 
previous variegation cycles is retained and many 
different surfaces that are related to the PPBD surface 
are produced, the process is progressive. If the level 

30 of variegation is so high that the ppM gene may not he 
detected, the assurance of progress ivity diminishes. 
If the probability of recovering PPBD is negligible, 
then the probability of progressive behavior is also 
negligible. 



35 
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m opposing force in our assign con.sidarat.ions is 
that PBBs are useful in the population only up to the 
amount that can be detected? any excess above the 
detectable amount is wasted. Thus we produce as many 
5 surfaces related to PPBD as possible within the 
constraint that the FFBD toe detectable. 

We defer specification of exactly how much 
variegation is allowed until we have; a) specified real 
10 nt distributions for a variegated codon, and b) 
examined the effects of discrepancies between specified 
nt distributions and actual nt distributions. 

13 

We must now decide bow to distribute the 
variegation, within the codons for the residues to be 
varied . These decisions are influenced by the nature 
of the genetic code. When vgDSA. is synthesisied, 

30 variation at the first, base of a codon creates a 
population containing amino acids from the same co-lurch 
of the genetic code table (as shown in the Table 3~S on 
p87 of W&TSS7) ; variation at the second base of the 
codon creates a population containing amino acids from 

25 the sarce row of the genetic code table; variation at 
the third base of the codon creates a population 
containing amino acids from the same box. If two or 
three bases in the same codon are varied, the pattern 
is more complicated. Worfc with 3D protein structural 

•30 models may suggest definite sets of amino acids to 
substitute at a given residue, but the method of 
variation may require either store or fewer kinds of 
amino acids be included. For example., examination of a 
modal -might suggest substitution of K or Q at a given 

35 residue. Combinatorial variation of codons requires 



WO 90/02809 



S89.0B73 



94 

that mixing N and Q; at one location also include K and 
H as possibilities at the same residue. One Bust 
choose to put; 1} N only, 2) Q only, or 3) a mixture of 
M t K, H, and Q. The present invention does not rely on 
S accurate, predictions of which asaino acids should be 
placed at each residue,, rather attention is focused on 
which residues should be varied. 

There, are many ways to generate diversity in a 
10 protein. (See RXCHS6, emms, and QMPB6.) One extreme 
case is that one or a few residues of the protein are 
varied as much as possible (inter alia see 
Cl&mi, PXCH36, and . We will call this limit 

"Focused Mutagenesis' 1 > Focused Mutagenesis is 
15 appropriate when the IPBD or other PPBD shows little or 
no .binding to the target, as at the beginning of the 
search for a protein to bind to a nmr target material, 
When there is no binding between the PPBD and the 
target, w© preferably pick a set of five to seven 
ao residues and vary each through ail 20 possibilities. 

An alternative plan of catagenesis ("Diffuse 
Mutagenesis") is to vary many more residues through a 
more limited set of choices (See Vershon et al... f Chi 5 

25 of IHOU86 and PAKG86) , This can be accomplished by 
spiking each of the pure nts activated for DMA- 
synthesis (e.g. nt-phosphorasiidites) with a small 
amount ox one or more of the other activated nts. 
Contrary to general practice, the present invention 

30 sets the level of spiking so that only a small 
percentage ( 1% to .00001%, for example } of the final 
prodi^ct contains the initial DH& sequence. Many 
single, double, triple, and higher mutations occur, but 
recovery ox" the basic sequence is a possible outcome, 

35 Let Kfc be the number of bases to be varied, and 1st Q 
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be the. fraction of all segruances that should have the 
parental sequence, then M\ the fraction of the mixture 
that is the majority component, is 

5 M ~ expf lm*(® m I ~ ^ < lo 9i0 , 



tt; £or example , thirty base pairs on the DHA 
chain vera to be varied ami 1% of the product is to 
have the parental sequence, then each mixed nt 

10 substrate should contain 86% of the parental nt and 14% 
of other nts* Table 8 shows the fraction (fn) of DBA 
molecules having n non-parental bases when 30 bases are 
synthesized with reagents that contain fraction U of 
the majority component* $hen M-.630SS, £24 and higher 

IS are less than 1G~ 8 . The entry »mo&t* in Table 0 is the 
mmher of changes that has the highest probability, 
Mote that substantial probability for multiple 
substitutions only occurs it the fraction of parental 
sequence (fO) is allowed to drop to around 1Q~^. 

20 Mutagenesis of this sort- can be applied to any part of 
the protein at any time, but is most appropriate when 
some binding to the target has been established. The 
N b base pairs of the mm chain that are synthesized 
with mixed reagents need not be contiguous. They are 

25 picked so that between N&/3 and H b eodons are affected 
to various degrees. The residues picked for mutation 
are picked with reference to the 3D structure of the 
IPSO, if known. For example,, one might pick all or 
most of the residues in the principal and secondary 

30 set. We say impose restrictions on the extent of 
variation at each ox these residues based on homologous 
sequences or other data. The mixture of non-parental 
nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific 

35 probabilities of appearance at each codon. For 
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example, one residue may contain a hydrophobic amino 
acid in all known 'homologous sequences j .In such a case, 
the first and third base of that cod on would be varied, 
tort t&s second would foe set to t. This diffuse 
5 structure-directed mutagenesis will reveal the subtle 
changes possible in protein backbone associated with 
conservative interior changes f such as V to X, as wall 
as soma not so subtle changes that require concomitant 
changes at two or mors residues of the protein, 

10 

For Focused Mutagenesis , ve now consider the 
distribution of nts that will foe inserted at each 
variegated codon. Bach co&om could foe programmed 
differently. If we have no information indicating that 

15 a particular amino acid or class of amino acid is 
appropriate, *re strive to substitute all amino acids 
with :«&ual probability because representation of one 
obd above the detectable level is wasteful, pg&al 
amounts of all four nts at each position ir? a codon 

20 yields the amino acid distribution in which each amino 
tcid is p t to the number of codons 

that code for it, This distribution has the 
disadvantage of giving two basic residues for every 
acidic residue. In addition,, six times as much. R, B r 

25 and Is as W or M occur. If five codons are synthesized 
with this distribution,, sequences encoding five Rs are 
7776-times more abundant than sequences encoding five 
Ws. To have w~W~W-W-W present at detectable levels, we 
must have E~E~&-~R~R present in 7776-fold excess, 

30 

Let I03xin{x} be the abundance of DNA sequences 
coding for amino asifi defined by the distribution of 
nts at each base of the codon. For any distribution, 
there will he a most -favored amino acid (:sfaa) with 
3 5 abundance Abun(mfaa) and a least-favored amino acid 
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(Ifaa) with abundance Afcim {!£&&) « We seek the nt 
distribution that allows ail twenty amino acids and 
that yields the largest ratio Abun (If aa}/A.bun (atfaa) 
subject to two constraints j equal abundances of acidic 
5 and basic amino acids and the least possible number of 
stop codons. Thus only nt distributions that yield 
AbunfE)+M>un(D) « ?dran .(R) *&bun(K) are. considered, and 
the function maximized, is: 

10 { (l-Abun(stop) ) {Abun(ifaa}/Afoun(infaa) ) } . 

We have simplified the search for an optimal nt 
distribution by limiting the third base to *£ or & (C or 
G is equivalent) . All amino acids are possible and the 
15 number of accessible stop codons is reduced because TGA 
and coders are eliminated. The amino acids F >( IE,. 

C, ft, M f I if and D require T at the third base while W, 
M, Q, K, and E require G> Thus we use an eguimolar 
mixture of T and 0 at the third base, 

20 

A computer program, written as part of the present 
invention and named «Pind Optimum vgcodon" {See Table 
9), varies the composition at bases 1 and 2, in steps 
of 0.05, and reports the composition that gives the 
25 largest value of the quantity { CAbun£l£aa}/Ahun(m£aa} 
■(lHSa»un(stop) 5 } }« A vg codon is symbolically- def ined 
by the nt distribution at each bass? 
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t.3 - g3 « 0.5, c3 *» a3 m 0. 

The variation of the quantities tl f cl, aX, gX, t2 , c2, 
5 a2 ? and g2 is subject to the constraint that 
Abun (£) +Abun (D) equals Abun { K) *Abon (S) ? 

Abun{E)+Abttii(D) » gX*a2 

jftran(B3+Abun(&) «■ al*a2/2 + ei*g2 + al*g2/2 

10 

gl*a.a = a.l*a2/2 * cl*g2 * al*g2/2 

Solving for g2, we obtain 

15 g2 m (gi*a2 ~ Q.5*aX*a2)/{cl -f- 0.S*aiJ * 

In addition, 

tx ~ X ~ al - ci ~ gl 
20 t2 « 1 * a2 ~ c2 ~ g2 

We vary al, cl, q\, a2, and c2 and then calculate ti, 
g3, and t2. Initially., variation is in steps of 5%< 
Once an approximately optisrum distribution of nhs is 
25 determined, the region- is further explored with steps 
of 1%. The logic of this program is shown in Table 9, 
The optimum distribution is: 

opt t ■ 

30 

T C A G 

base #1 ~ 0.2.6 0.18 0.26 0,30 
base #2 « 0.22 0.16 0.40 0*22 
base #3 «■ O.S 0.0 0.0 0.5 

35 
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ana yields DNA molecules encoding each type amino acid 
with the abundances shown in Table 10. 

The computer that controls a DNA synthesiser, such 
5 as the Hilligen ?SOQ, can be programmed to synthesize 
any base of an ©Xigo-afc:- with any distribution of nts by 
taking some, nt substrates («,.g». nt phosphoramidites) 
from each of two or more reservoirs. Alternatively, nt 
substrates can bo mixed in any ratios and placed in one 
10 of the extra reservoir for so called "dirty bottle" 
synthesis. 

The actual nt distribution obtained will differ 
from the specified nt distribution due to several 

15 causes, including j a) differential inherent reactivity 
of nt substrates, and b) differential deterioration of 
reagents. It is possible to compensate partially for 
these effects, but some residual error will occur. We 
denote the average discrepancy between specified and 

20 observed nt fraction as Sgr r , 

Serr - square root { average [ (f obs - f spe o)/ f spec 1 ? 

were f obs is the amount of one type of nt found at a 
25 base and f sp@c is the amount of that type of nt. that 
was specified at the same base. The average is over 
all specified types of nts and over a number fe : eu, 10 
or 20} different variegated bases. By hypothesis., the 
actual nt distribution at a variegated base will be 
30 within 5% of the specified distribution. Actual DNA 
synthesisers and m& synthetic chemistry may have 
different error levels. It is the user's 
responsibility to determine S err for the DHA 
synthesiser and chemistry employed, 

35 
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To determine the possible effects of errors in nt 
composition on the amino-acid distribution, we modified 
the program "Find Optimum vgCodon" in four ways? 



S 1} the fraction of each, at in the first two bases 

is allowed to vary' from its optimum valae times (1 
*" S err ) to the optima value times (1 * S err ) in 
seven equal steps (S err is the hypothetical 
fractional error lavel entered toy the user) ; the 
10 sum of nt fractions at one base always equals X.G, 

2} g2 is varied ia the same manner as a2, i-e^ we 
dropped the restriction that < i fid 
Abun{K)+-Atoun(R) , 

IS 

3} t.3 and g3 are varied from 0,5 times (X - S^rr) 
to 0.5 times' (1 + S err ) in three equal steps, r 

4). the emallest ratio Atoun{lfaa} /A.bun{mfBs) is 
20 sought, 

In actual experiments, we will direct the synthesiser 
to produce the optimum DMA distribution "Optimum 
vgCodon" given above. Incomplete control over DMA 

2S chemistry may, however , cause us to actually obtain the 
following distribution that is the worst that can be 
obtained if all. nt fractions are within 5% of the 
a m ouats specif ie d i n » Op t i mum vg c o <3 on ». A 
corresponding table can toe calculated for any given 

30 S err using the program "Find worst vgCodon within serr 
of given distribution.** given in Table XI. 



WO 90/02809 



PC171JSS9/0373! 



xox 

base #1 - 0.2S1 0,3,89 0.273: 0,23? 

base #2 ~ 0.209 0,|# 0.400- 0.231 

base #3 ^ Q.475 0.0 0.0 0,525 

S mis distribution yields DFA encoding different 

amino acids at the abundances shown in Table 12, 

If five codons ere synthesized with reagents mixed 
so as to produce the ^distribution "Optimum vgCodon*', 

10 and if we actually obtained the nt -distribution 
"Optimum vgcodon, worst S% errors", then DMA sequences 
encoding tbe mfaa at all of the five codons are about 
277 times as likely as DBA sequences encoding the ifaa 
at ell of the five codons? about 24% of the mth 

15 sequences will have a stop codon in one or more of the 

When five codons ate synthesized using equimoiar 
mixtures at bases 1 and 2, (Abun (mf aa) /.ton (Ifaa) } 25 ~ 

20 77? S. If we program the optimum nt distribution and 
come within 5%, then (Abun(Evfaa)/abun(if aa) } s « 277, 
The total number of different FBDs is unchanged, but 
the least -favored sequence is about 28 times more 
abundant. Detecting tbe least -favored amino-acid 

25 sequence When varying four residues with aguimolar nts 
at each varied base requires as sensitive a separation 
system as does detecting the least-favored amino-aeid 
sequence when varying five residues with the optimised 
nt distribution. 

30 

By hypothesis, the distribution "Optimal vgcodon" 
is used in the second version of the second variegation 
of hypothetical example 2. The abundance of the DKA 
sneoding each type of amino acid is,, however, taken 
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from the Table 12* abundance of DNA encoding the 

parental amino acid sequence is; 



Amount (parental seq, ) 
5 T24 <33Q 034 142 T47 

** Abun(F) * Abun(G) * Afeun(D) * Xfoun(E) * Abun(T) 
*~ ,0249 X .0663 3£ ,0545 S£ ,0602 5C .0437 
~ 2,4 X 1CT 7 



10 Therefore, DHA encoding the PPED sequence as well as 
very many related sequences will foe present in 
sufficient, quantity to foe detected and we are assured 
that the process will be progressive. 

IS A level of variegation that allows recovery of the 

PPBD has two properties: 

1} we cannot regress because the PPBD is 
available;, 

20 

2} an enormous number of Multiple changes related 
to the PPBD are available for selection and k are 
able to detect and benefit from these changes. 



25 The user must adjust the list of residues to foe 

varied and levels of variegation at each residue until 
the calculated variegation is within the bounds set .by 
%tv ^d C sensi . 

30 Preferably, we also consider the interactions 

between the sites of variegation and the surrounding 
WA* If the method of mutagenesis to foe used is 
replacement of a cassette, we consider whether the 
variegation will generate gratuitous restriction sites 

35 and whether they seriously interfere with the intended 
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introduction of diversity, We reduce or eliminate 
gratuitous restriction sites by appropriate choice of 
variegation pattern, and silent alteration of co-dons 
neighboring the sites of variegation. See the Detailed 
S Example > 

Sec. 1.1,1,: lr:::<;;.ili.oj:: HI ;ii;;t;:v-i;i,c vgDNA into,, a 

Plas mMs? 

10 For cassette mutagenesis,, restriction sites were 

designed and synthesized,, and are used to introduce the 
synthetic vgDNA into the OCV. Restriction digestions 
and ligations are performed by standard methods 
{AUSU87 ) . In the case of single-stranded- 

15 oligonucisotide-direotsd mutagenesis, synthetic vgDKA 
is used to create diversity in the vector (BOTS8S) , 

ass.* M A 2i..lmm^m^^im^M^^Mi. 

m 3?he present invention is not limited to any one 

method of transforming cells with DMA, Standard 
methods, such as thos described in HWRI82, may be 
optimised for the particular host cells and OCV , The 
goal is to produce a large nuober of independent 

25 trans formants, preferably 10 7 of more, It is not 
necessary to isolate transformed calls between 
transformation and affinity separation. We prefer to 
have transformed cells at high concentration so that 
they can be plated densely on relatively few plates, 

30 

Sec,, ,. 14 , li _ Jlro5£th„ c L f „ i he^ Xill' ) Pf^l a t,;ion,i 

The transformed cells are grown first under non- 
elective conditions that allow expression oi plssmid 
35 genes and then selected to till untransf ormed cells. 
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Transformed cells are then induced to stress the osn- 
phd gene, at the appropriate level of induction., as 
determined in Sec, 10. x. The GPs carrying the IPBD are 
harvested fcy a method appropriate to the package, 

5 

A high level of diversity can be generated by in 
vitro var legated synthesis of DHA and this diversity * 
can be maintained passively through several generations 
in an organism without positive selective pressure, * 

10 boss or reduction in frequency of deleterious mutations 
is advantageous for the purposes of the present 
invention, it is preferable that the selection is must 
be performed before more than a few generations elapse. 
Moreover, subdividing the variegated population before 

15 amplification in an organism by removing a small sample 
(less than 10%) for further wor'h would result in loss 
of diversity; therefore, one should use all or most of 
the synthetic D® and most or all of the transformed 
calls, 

20 

&8fes IS..--? Isolation of GPjPBD? s_vdt. U biim.o.gr.to r . 

. r t-^kenotyp^s^i. 



The harvested packages are enriched, for the 
25 binding-to-target phenotype toy use of affinity 
separation involving target material immobilized on a 
matrix. Packages that fail to bind to target material 
are washed away. If the packages are bacteriophage or 
endosporas, it may be desirable to include a 
30 bacteriocidal agent, such as aside, in the buffer to 

prevent bacterial growth., * 
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Affinity column chromatography is the preferred 
method of affinity separation, tout otter affinity 
separation methods aay be used. A variety of 
commercially available support materials for affinity 
5 chromatography are used, These include derivatized 
beads to which the target 'material is eovalently 
linked, or non-derivatised ms.ta.rial to which the target 
material adheres irreversibly. 

10 Suppliers of support material for affinity 

chromatography includes Applied Protein Technologies 
Cambridge, MA? Bio-Pad Laboratories , Rochvilie Center, 
NY? Pierce Chemical company, Reckrord, XL. Target 
materials are attached to the matrix in accord with the 

IS directions of the manufacturer of each matrix 
preparation with consideration of good presentation of 
the target. 

Sec. 15.3; Hed&M &a 
20 .binding: 

We reduce non-specific binding ox GF(PBD)s to the 
matrix that bears the target in two wave; 

25 1) we treat the column with blocking agents such 

as genetically defective GPs or a solution of 
protein before the population of GP(vgPBD)s is 
«hrossatographed f end 

30 2} we pass the population of 63?{vgPBD)s over a 

matrix containing no target or a different target 
from the satae class as the actual target prior to 
affinity chromatography. 
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Step {1} above saturates any non-specific binding that 
the affinity matrix might stow toward wild-type g-ps or 
pro tair s in gansr; ^ ~ N < oar 

population that exhibit non-spacif ic binding to the 
5 matrix or to molecules of the same class as the target. 
If the target were horse heart myoglobin, for example, 
a column supporting bovine serum albumin could be used 
to trap GPs exhibiting PBDs with strong .non-specific 
binding to proteins. If cholesterol ware the target^ 

16 then a hydrophobic compound, such as p~ 
tertiarybutylbenzyl alcohol, could be used to remove 
GPs displaying PBDs having strong non-specific binding 
to hydrophobic compounds, It is anticipated that S>BDs 
that fail to fold or that are prematurely terminated 

15 will be non-specifically stic&sf. The capacity of the 
initial column that removes indiscriminately adhesive 
PBDs should be greater {e...o\. 5 fold greater) than the 
column that supports the target molecule. 

20 Variation in the support material (polystyrene t 

glass, agarose, etc. ) in analysis of clones carrying 
SBDs is used to eliminate snri.chm.ont for packages that 
bind to the support material rather than the target. 

25 Sec *..,15 , 3 S Milt 

The population of GPs is applied to an affinity 
Matrix under conditions compatible with the intended 
use of the binding protein and the population is 
3 0 fractionated by passage of a gradient of some solute 
over the column. The process enriches for PBDs having 
affinity for the target and for which the affinity for 
the target is least affected by the eluants used, 
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Ions or ccf&ctofcs needed for stability of PBDs 
( derived from IPSO) or target must be includes in 
buffers at appropriate levels. We first rssovs 
GF<PBB}s that do not bind the target by cashing the 
§ matrix with the volume of the initial buffer required 
to bring the optica), density {at 260 rm or 280 nm) back 
to base line pins one to five void volumes (V v ) . The 
coXuBft is then elated n±th a gradient of increasing? a) 
salt, h) [H+1 (decreasing pH) , c) neutral solutes, d) 

10 temperature (increasing or decreasing) , or e) some 
combination of these factors. Salt is the most 
preferred solute for gradient formation. Other solutes 
that generally weaken non-covalent interaction may also 
he used, "Salt** includes solutions containing any of 

15 the following ionic- species J 



» ** Ca*+ Hg*+ 

.30 Li+ Sr++ ' 

Bbr Cs-f Ci~ Br~ 

S0 4 — HS0 4 ~ P0 4 

25 B 2 PQ 4 ~ C0 3 — HC0 3 - Acetate 

Citrate Standard 1- Standard Guanidinium 

toino Acids nucleotides Ci 

30 

Other ionic or neutral solutes may be used. Ml 
solutes are subject to the necessity that they not. kill 
the genetic packages. neutral solutes, such as 

35 ethanol, acetone, ether, or urea., are frequently used 
in protein purification, however, many of these are 
very harmful to bacteria and bacteriophage above low 
concentrations . Bacterial, spores, on the other hand, 
are impervious to most neutral solutes. Several passes 

40 may be made through the steps in Sec, 13, Different 



wo m/mm 



i r/us* 03 



103 

solutes may be used in different analyses, salt In one, 
pH in the next, etc, 

S 

Recovery of packages that display binding to an 
affinity column may be achieved in several ways, 
including from: 

10 1} fractions elated with a gradient as described 

above? 

2} fractions elated with soluble target material, 
3} ceils grown in, .situ on the matrix, 
4) cells incubated with parts of the matrix, 
15 5) fractions elated after chemically or 

enrymatieally degrading the linkage holding the 
target to the matrix, and 
6} regeneration of GPs after degrading the 
packages and recovering OCV 

30 

It is possible to -utilise combinations of these 
methods, It should, be remembered that what ws want to 
recover from the affinity matrix is not the GPs per se f 
but the \ n t u -v. in them. Recovery of viable GPs is? 
25 wry strongly preferred, but recovery of genetic 
material is essential. 

Inadvertent inactivation of the GPs is very 
deleterious . It is preferred that maximum, limits for 

30 solutes that do not inactivate the GPs ox- denature the 
target or the column are determined. One nay use 
conditions that denature the column to elute GPs; 
before the target is denatured,, a portion of the 
affinity matrix should be removed for possible use as 

35 an inoculum. As the GPs are held together by protein- 
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protein interactions and ©t&$£ .non-covalent molecular 
interactions, there will fee cases in which the 
molecular package will hind so tightly to the target 
molecules on the affinity matrix that the GPs can not 
5 be washed off in viable fonts. This will only occur 
when very tight binding has teen obtained. In these 
eases, methods (3) through (S) above can he used to 
obtain the bound packages or the genetic messages from 
the affinity matrix, 

10 

It is possible, by manipulation of ths;- elixtion 
conditions, to isolate SBDs that bind to the target at 
one pH (pl%) but .not at another pH (pH Q ) , The 
population is applied at pH b and the column is washed 

15 thoroughly at p}%. The column is then elated- with 
buffer at pH 0 and <3Ps that cose off at the new pH are 
collected and cultured. Similar procedures Bay be used 
v for other solution parameters,, such as temperature. 
For example, GP£vgFBB)s could be applied to a column 

20 supporting insulin. After elating with salt to remove 
GPS with little or no binding to insulin, we elute with 
salt and glucose to liberate GPs that display PBDs that 
bind insulin or glucose in a competitive manner. 

25 Sec, 15,5 : -amplifying the , las lghed Pacfrage^ 

Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium,, or, in the 
case of phage, infection into a host so cultivated. If 
3D the CPs have been inactivated by the chromatography, 
the OCV carrying the oso-pbd gene must be recovered 
fro® the GP, and introduced into a new, viable host. 

Per 1 * T 

35 needed i 
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The probability of isolating a GP with improved 
binding increases by C 6 ff with each separation cycle < 
let H be the number of distinct amino-acid sequences 
5 produced by the variegation. We want to perform I 
separation cycles before attempting to isolate an SBD, ( 
where K is such that the probability of isolating a 
single SBD is 0.10 or higher. 

10 K ** the smallest integer>~ iog 10 (cuio N)/log 10 {C e ^£} 

for example, if N were X.O x XO 7 and C ef f ™ 6,31 x XQ 2 
than log 10 (1.0 X 10 6 ^ /Xag 10 (6. 31 x XO 2 } 6.0000/3-8000 
» 2.14. Therefore we weald attempt to isolate SBDs 
IS after the third separation cycle. After only two 
separation cycles, the probability of finding an SBD is 
(6,31 x io 2 } a /{l.o jc XO 7 ) « ,04 and attempting to 
isolate SBDs might be profitable. 

20 Clonal isolates froxa the last fraction elated in 

See. 15.3 containing any viable GPs f as well as clonal 
isolates obtained by oulturing an inoculum taken from 
the affinity matrix, are cultured. If K separation 
cycles have been completed , samples from a number, <t»y« 

23 32 , of these clonal isolates are tested for elution 
properties on the {target} column. If none of the 
isolated, genetically purs CPs show improved binding to 
target, or if K cycles have not yet been completed,, 
then w® pool and culture, in a maimer similar to the 

30 manner set forth in Sec, 14.3, the CPs from the last 
few fractions eixated {see Sec. 15.4} that contained 
viable GPs and from the GPs obtained by cu.ltu.ring an 
inoculum taken from the column matrix, We then repeat 
the enrichment procedure described in Sec. 15, This 
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cyclic enrichment may continue ^chrom passes or until 
an SBD is isolated. 



If on© or more of the isolated GPs has Improved 
5 retention on tee {target} column, we determine whether 
the retention of the candidate SBDs is due to affinity 
for the target material. Target material is attached 
to a different support matrix at optimal density and 
the elution volumes ef candidate GP(SBD)a are measured. 

10 We pick the candidate that either has the highest 
elation volume or that is retained on the column after 
elation. If none of the candidate GP(SB,D)s has higher 
elation volume than GP£PPBD of this round) , then we 
pool and culture the CPs from the last few fractions 

15 that contained viable GPs and the GPs obtained by 
eulturing an inoculum taken from the column matrix, We 
then repeat the e,)r.:„> * | - < an -e ~: Bo N > 15, 

If all of t!ie.|8S5s show binding that is superior 
20 to PPBD of this round, we pool and' culture the GPs from 
the last fraction that contains viable GPs and from the 
inoculum taken from the column. This population is re- 
ehromatographed at least one pass to fractionate 
further the GPs based on %» 

25 

If an .SNA phage were used as GP f the ENA would 
either be cultured with the assistance of a helper 
phage or be reverse transcribed and the JMA amplified, 
The amplified DKA could then be sequenced or suhoionad 
30 into suitable piasmids. 

liUU ^rayaote- y.tj fcpvlzt Lpny 

We characterise members of the population showing 
35 desired binding properties by genetic and biochemical 
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methods*. We obtain clonal isolates and test these 
strains hy genetic ana affinity methods to determine 
genotype and phototype with respect, to binding to 
target, Tor several genetically pure isolates that 
S show binding, we demonstrate that the binding is caused 
by the artificial chimeric gene by excising the psp-sfa^ 
gene and crossing it into the parental SP. We also 
ligate the deleted backbone of each GP .from which the 
osp--sbd is removed and demonstrate that each backbone 
10 alone cannot confer binding to the target on the sp. 
We sequence the csn:;~ebe gene from several clonal 
isolates* 

Sec,..,..25^1i T$sting,,M.MMim-M£W2&$> 

15 

For one or mors clonal isolates, we subclone the 
.eM gene fragment, without the osm. fragment, into an 
expression vector such that, each SBD can foe produces as 
a free protein. Bach SBD protein is purified by normal 

26 means,, including affinity Chromatography, Physical 
measurements of the strength of binding are then made 
on each free SBD protein by one of the following 
methods; 1} alteration of the Stokes radius as a 
function of binding of the target material, measured by 

25 characteristics of elation from a molecular si&ing 
column such as agarose, 2) retention of radiolabeled 
SBD on. a span affinity column to which has been affixed 
the target material, or 3} retention of radiolabeled 
target material on a spun affinity column to which has 

30 been affixed the SBD, The measurements of binding for 
each free SBD are compared, to the corresponding 
a *a ixxz ents £ b ling for the PPBi>, 



In each assay, we measure the extent of binding as 
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a function of concentration of each protein, ana other 
relevant physical and chemical, parameters. 

In addition, the SBD with highest affinity for the 
5 target from each round is compared to the test SBD of 
the previous round (XPBD for the first round) ana to 
the XPBD with respect t«- affinity for the target 
material. Successive rounds of mutagenesis and 
select ion-through-hinding yield increasing affinity 
10 until desired levels are achieved. 

If binding is not yet sufficient,, we must decide 
which residues to vary nesst (see See, 16.0). 

15 §m^^J^±^^Miin.itY Sm^ti9M..M^mi 

F&cs may he used to separate G.Ps that bind 
fluorescent labeled target with the optimised 
parameters determined in Part II, We discriminate 
20 against artifaetual hin&iug to the fluorescent labia by 
using two or mora different dyes, chosen to .be 
structurally different , 

Eleetrophoretic affinity separation uses unaltered 
25 target so that only other ions in the buffer can give 
rise to art if actual binding, Art if actual binding to 
the gel material gives rise to retardation independent 
of field direction and so is easily eliminated, A 
variegated population of GPs will have a variety of 
30 charges. 

First the variegated population of GPS is 
electrophoreses in a gel that contains no target 
material. The electrophoresis continues until the GPs 
35 are distributed along the length of the lane. The 
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target -free lane in which the initial electrophoresis 
is conducted t<? " fr03 a 

.baffle is removed and a second electrophoresis is 
S conducted at right angles to the first. GPs that do ■ 
net bind target migrate -with unaltered mobility while 
SPs that do hind target will separate from the majority 
that do not bind target* A diagonal line of. non- 
binding -GPs will form* This line is excised and 
10 discarded. Other parts of the gel are dissolved and 
the GPs cultured.* 

Sec, 16 ,0; T~e tfext ^ Cycle; 

15 Which residues of the PSD should be varied in the 

next variegation cycle? The general, rule is to 
preserve as wach acoumulatad information as possible. 
The amino acids jrast varied are the ones beet 
determined. The- environment of other residues has 

20 changed, so that it is appropriate to vary then, again. 
Because there are always more residues in the principal 
and secondary sets than can he varied simultaneous ly , 
we start by picking residues that either have never 
been varied (highest, priority) or that have not been 

25 varied for one or more cycles. If we find that varying 
all the residues except those varied in the previous 
cycle does not allow a high enough level of diversity, 
then residues varied in. the previous cycle might fee 
varied again. For example, if *-he number of 

3 0 independent trans f ormants that can be produced a nd -the 
sensitivity of the affinity separation were such that 
seven residues could he varied,, and if the principal 
and secondary sets contained 13 residues, we would 
always vary seven residues,, even though that implies 

35 varying some residua twice in a row. In such cases, we 
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would pick, the residues just varied that contain the 
amino acids of highest abundance in the variegated 
eodons used. 

S It is the accumulation of information that allows 

the process to select those protein sequences that 
produce binding between the SBD and the target. Some 
interfaces between proteins and other molecules involve 
twenty or more residues. Complete variation of twenty 

10 residues would generate 10 26 different proteins. By 
dividing the residues that lie close together in space 
into overlapping groups of five to seven residues, we 
can vary a large surface hut never need to test more 
than 10 7 to 10 9 candidates at once, a savings of 10 19 

1.5 to 10 17 fold. 

Having picked the residues to vary, ve again set 
the range of variegation for each residue according to 
the principles set forth in 13,2, design the vgDNA 
20 encoding the desired mutants (Sec. 13.3) , clone the 
vgDNA into GFs (Sec. 14} , and select-by-binding-to- 
target those «3Ps hearing SBDs (Sec. 15) , 

Sec, 12....1J Joint... sal eotjorisL 

One Bay modify the affinity separation of the 
method described to select a molecule that binds to 

30 material A but not to material 8, One needs to prepare 
two selection columns, one with material A and the 
other with material B, The population of genetic 
packages is prepared in the manner described, but 
before applying the population to h, one passes the 

35 population over the 8 column so as to remove those 
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members of the population "that; have high affinity for 
B, It Bay be necessary to amplify the population that 
does not bind to B before passing it over .&« 
Amplification would .most likely be needed if A and B 
S -were in sons ways similar and the PPBD has been 
selected for Having affinity for A* 

For example,, to obtain an SBD that binds A but not 
B, three columns could be connected in series: a} a 

10 column supporting some compound t neither A nor B, ox- 
only the matrix material, b) a column supporting B f and 
o) a column supporting A, A population of <3P(vgPBD)s 
is applied to the series of columns and the columns are 
washed with the buffer of constant ionic strength that 

15 is used in the application. The columns are uncoupled , 
and the third column is eluted with a gradient to 
isolate Gp(FBD}s that hind A but not B. 

One can also generate molecules that bind to both 
20 A and B» In this case we use a 3D model and mutate one 
face of the molecule in question to get binding to A, 
We then mutate a different face to produce binding to 
B» 

25 The materials A and B could be proteins that 

differ at only one or a few residues. For example, & 
could be a natural protein for which the gene has been 
cloned and B could be a mutant of & that retains the 
overall 3D structure of A. SBDs selected to bind A but 

30 not B must bind to A near the residues that are mutated 
in B, if the mutations were picked to be in the active 
site of A (assuming A has an active site) , then an SBD 
that binds A but not B will bind to the active site of 
A and is likely to be an inhibitor of h- 

35 
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To obtain a protein that will bind to both A and 
B, we can, alternatively, first obtain an SBD that 
binds A and a different SBD that binds B« We can then 
combine the genes encoding these domains so that a two- 
5 domain ssingla-polypeptide protein is produced. The 
fusion protein will have affinity for both A and B. 

One can also generate binding proteins with 
affinity for both A and 8, such that these materials 

10 compete for the saxae site on the binding protein. We 
guarantee competition by overlapping the sites for A 
and B. We first create a molecule that hinds to target 
material ft, We then vary a set of residues defined asi 
•a) those residues that were varied to obtain binding to 

15 A., plus b) those residues close in 3D space to the 
residues of set (a) tout that are internal and so are 
unlikely to bind directly to either A or B« Residues 
in set (h) are likely to make small changes in the 
positioning of the residues in set (a) such that the 

20 affinities for A and B mil be changed by small 
amounts. Members of these populations are selected for 
affinity to both A and B« 

25 

The method of the present invention can foe used to 
select proteins that do not bind to selected targets. 
Consider a protein of pharmacological importance, such 
as streptokinase , that is antigenic to an undesirable 

30 extent* We can take the pharmacologically important 
protein as XPBD and antibodies against it as target. 
Residues on the surface of the pharmacologically 
i?sporrauu protein would be variegated and GP(PED) s th.it 
do not bind to an antibody column would be collected 

35 and cultured. Surface residues may be identified in 
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several ways, including? a) from a 3D structure, b) 
from h\ tions, or c). chemical 

labeling, The 3D structure of the pharmacologically 
important protein remains the preferred guide to 
5 picking residues to wry, except now we pick residues 
that are widely spaced so that w© leave as little as 
possible of the. original surface unaltered.* 

Destroying binding frequently requires only that a 
10 single amino acid in the binding inter < < 

Xf polyclonal antibodies are used, we face the problem 
that, all or most of the strong epitopes mast be altered 
in a single molecule. Preferably, one vonM have a set 
of monoclonal antibodies, or a narrow range of antibody 
15 species* If we had a series of monoclonal antibody 
columns , we could obtain one or more mutations that 
abolish binding to each monoclonal antibody.. We could 
then combine some or ail of these mutations in .one 
mo locals to produce a pharmacologically important 
20 protein recognised by none of the monoclonal 
antibodies. Such mutants must be tested to verify that 
the pharmacologically interesting properties have not 
be altered to an unacceptable degree by the mutations* 

25 Typically, polyclonal antibodies display a range 

of binding constants for antigen. Even if ve have only 
polyclonal antibodies that bind to the 
pharmacologically important protein, we may proceed, as 
follows. We engineer the pharmacologically important 

30 protein to appear on the surface of a replieable GP» 
We introduce mutations into residues that are on. the 
surface of the pharmacologically important protein or 
Into residues thought to be on the surface of the 
pharmacologically important protein so that a 

35 population of SPs is obtained* Polyclonal antibodies 
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are attached to a column and the population of GPs is 
applied to the column at low salt. The column is 
©luted with a salt gradient. The GPs that elate at the 
lowest concentration of salt are those which bear 
5 pharmacologically important proteins that have been 
mutated in a way that eliminates binding to the 
antibodies having maximum affinity for the 
pharmacologically important protein. The GPs elating 
at the lowest salt are isolated and cultured. The 
10 isolated SBD becomes the FFBD to further rounds of 
variegation so that the antigenic determinants are 
successively eliminated. 

See. 17.3 3 Select ion of , „ ,EBCS for, ret<mt;c.Q of. 

is etrM&smi 

We can select for insertions or deletions that 
preserve the 3D structure of known binding proteins » 
Consider on GP that express BPTI on its surface. In 

20 the boti-osp gene, we can replace the codons for K26 
and A2? with five variegated codons (3.2 X 10* 
sequences) , K26 and A3? are in a turn and are far from 
the trypsin binding surface* We use selection-through™ 
binding to isolate GFs expressing mutants of BPTT that 

23 retain high, specific affinity for trypsin. 

Sec. 17, 4 1 C r eated bindl m-^oteins >fflJLJffiiffla& 

For each target, there are a large number of SBDs 
30 that may be found by the method of the present 
invention, To increase the probability that some PBD 
in the population will bind to the target, we generate 
as large a population as we can conveniently subject to 
selection-through-binding, Key questions in management 
3S of the method are "How many trans £ ormants can we 
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produce? % and "How small a component can we find 
through sal set i on - through -binding? " . Geneticists 
.routinely find. mutations with frequencies of one in 
using simple, powerful selections,. The optimism 
§ level of variegation is determined by the maximum 
nuBfoer of transformants and the selection sensitivity., 
so that for any reasonable sensitivity w© may use a 
progressive process to obtain a ssri.es of proteins with 
higher and higher affinity for the chosen target 
10 material. Enrichments of 1000-fold by a single pass of 
alution from an affinity plate have been demonstrated 
(SMITES) , 

Use of different variation schemes can yield 
IS different binding proteins. For any given target, a 
large plurality of proteins will bind to it, ! fhus f if 
one binding protein turns out to foe unsuitable for some 
reason { & r «,g«, too antigenic) f the procedure can be 
repeated with different variation parameters . For 
SO example, one might choose different residues to 'vary or 
pick a different nt distribution at variegated codons 
so that a .new distribution of amino acids is tested at 
the same residues. Even if the same principal, set of 
residues is used, one might obtain a different SW if 
25 the order in which one picks subsets to be varied is 
altered, 

s y . . >. . ... oi motaganesls possible: 

30 The modes of creating diversity in the population 

of GPs discussed herein are not the only modes 
possible, any method of mutagenesis that preserves at 
least a large fraction of the information obtained from 
one selection and then introduces other mutations in 

35 the same domain will work. The limiting factors are 
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the nuabar of independent transformants that am be 
produced and the amount of enrichment one can achieve 
through affinity separation. Therefore the preferred 
embodiment uses a method of mutagenesis that focuses 
5 mutations into those residues that are most likely to 
affect the binding properties of the FBD and are least 
likely to destroy the underlying structure of the I PSD, 

Other modes of mutagenesis might allow other GPs 
10 to foe considered , For example,, the bacteriophage 
lambda is not a useful cloning- vehicle for cassette 
mutagenesis because of the plethora of restriction 
sites. One can, however, use single-stranded-- oligo- 
nt~directed mutagenesis on lambda without the need for 
15 unique restriction sites. No one has used single- 
stranded~oligo-nt-direoted -mutagenesis to introduce the 
high level of diversity called for in the present 
invention, but if it is possible, such a method would 
allow use of phage with large genomes. 



so 
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20 



Presented below is a *t a 

protocol for developing a new binding molecule derived 
from BPTX with affinity for horse heart myoglobin 

10 (HHMb) using the camon coll bacteriophage K13 as 
genetic package. it. will be understood that some 
further optimization, in accordance vita the teachings 
herein, may be necessary to obtain the desired results?* 
Possible modifications in the preferred method are 

2.5 discussed immediately following various steps of the 
hypothetical example « 



By hypothesis, we set the following technical 
capabilities s 

Y DQ 500 ngy' synthesis of &&mK 100 bases 

long, 

10 ug/synthesis of ssDHA 60 bases long, 
1 rg/synthesis of ssDMA 20 bases long. 



H DM& 100 
YpX 1 sg/1 

%f 0.1. % for blunt-blunt, 

4 % for sticky-blunt, 
11 % for sticby-sticiy, 



35 
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10 passes 
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In this example* we will use M13 as a re.plica.bie 
SP and BPTI as XPBD. In Part I, va are concerned only 
with getting BPTI displayed on the outer surface of an 

15 MIS derivative. • Variable DHA. jsay be introduced in the 
„ s ~ . < »n < but not within the region that codes for 
the trypsin--bin&ing region of BPTX, Once BPTI is 
displayed on the Ml 3 outer surface of an MX3 
derivative, s rt IX to os 

20 affinity separation procedures, 

For this example., we choose a filamentous 
bacteriophage of Ei, coll., MX3, We prefer phage over 
vegetative bacterial cells because phage are such less 
25 suctafcoiically active. We prefer phage over spores 
because the -molecular mechanisms of the virion 
formation and 3D structure of the virion are much 
better understood than are the corresponding processes 
of spore formation and structures of spores. 



M13 is a very well studied bacteriophage, widely 
used for DHh sequencing and as a. genetic vector; it is 
a typical member of the class of filamentous phages » 
The relevant faces about Hi 3 and other phages that will 
35 allow -us to choose among phages are cited in Sec, 
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Compared to other bacteriophage , filamentous phage 
in general are attractive and M13 in particular is 
especially attractive because; 

1} the 3D structure of the virion is known., 

2) the processing est the coat protein is well 
understood, 

3} the genome is expandable, 

4) the genome is small, 

5} the sequence of the genome is known, 



6} the virion is physically resistant to shear, 
heat, cold, gAmniainium CI, low p:E, and high salt, 

20 

7} the phage is a sequencing vector so that 
sequencing is especially easy, and 

8) antibiotic-resistance genes have been cloned 
25 into the ge.nose with predictable results (HINE8 0) . 

Other criteria listed in Sec, X.Q and 1,3 of the are 
also satisfied?. K13 is easily cultured and stored 
(FRITS 5} , each infected cell yielding 100 to looo HI 3 

3 0 progeny after infection. HI 3 has no unusual or 
expensive media requirements and is easily harvested 
and concentrated (£ALI^, YA«A70 f FRIT? 5} . Hi 3 is 
stable toward physical agents: temperature (10% of 
phage survive 30 minutes at 85°c) , shear (Waring 

33 blander does not kill}, desiccation (not applicable), 
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radiation (not applicable) f age (.stable for years) , 

M13 is stable toward .chemicals % pH (< 2.2 
(8MXT85)), surface active agents; not applicable, 
5 chaotropes (giianidiniuu HC.1 6,0 M) , ions (no specific 
sensitivities) , organic solvents (ether and other 
organic solvents are lethal (MMcvlS ) } „ proteases (not 
applicable, HEiMb not a protease) . M13 is not known to 
be sensitive to other ensymes, 

as 

KX3 genosse is 6423 b.p, and the sequence is known 
(SCK&78) . Because the genome is small, cassette 
mutagenesis is practical on rf .Ml 3 (AUSU87) , as is 
Single-stranded oligo-nt directed mutagenesis (FRIT35) - 

15 M13 is a plasmid and transformation systesa in itself, 
and an ideal sequencing vector. M13 can be grown on 
R©c~ strains of |h coll. The MX3 genome is expandable 
{MBSS78 , FRITS 5) . H13 confers no advantage, bat 
doesn't lyse ceils. The sequence of gene vrrx is 

20 known, and the amino acid sequence can be encoded on a 
synthetic gene, using lac-UTS promoter and used in 
conjunction with the LacX^ repressor. The X.acUVS, 
promoter is induced by IPTG. Gene VII X protein is 
secreted by a well studied process and is cleaved 

25 between A33 and &24. Residues IS, 21, 22, and 23 of 
gene VIII protein control cleavage. Mature gene 7111 
protein makes up the sheath around the circular ssDNA. 
The 3D structure of fl virion is known at medium 
resolution; the amino terminus of gene VIII protein is 

30 on surface of the virion- No fusions to HI 3 gene VXXI 
protein have been reported. The 2D structure of :M13 
coat protein la implicit in the 3D structure. Mature 
HI 3 gene VX11 protein has only one domain. There are 
four minor proteins: gens XXX, VI, VII f and XX. Baeh 

35 of these minor proteins is present in about S copies 
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per virion and is related to r.c. as is or 

infection- The major coat protein is present in more 
than 2500 copies per virion. 

.5 iVth i no fusions of H13 gene VXIj; to other 

genes have been reported., knowledge of the virion 3D 
structure makes attachment of XPBD to the 

amino terminus of -mature Ml 3 coat protein (m.1.3 cp) 
quite attractive. Should direct fusion of BPTI to M13 
10 CP fail to cause BPTI to be displayed on the surface of 
K13, we will vary part of the BPTI sequence and/or 
insert short random DMA sequences between BPTI and Kit 

13 Smith (S.MIT85) and de la Crus at aj.., (CHPZ-S8) have 

Sham that insertions into gene III cause novel protein 
domains to appear on the virion outer surface, If SP^l 
can not he -made to appear on the virion outer surfaoe 
toy fusing the fopfrl gene to the ml3cp. gene, m will fuse 

20 .bpti to gene III either at the sits used by Smith and 
by de la Crus et ah or to one of the termini. We will 
nee a second, synthetic copy of gene ill so that some 
unaltered gene III protein will foe present. 

25 The gene VIII protein is chosen as OSP because it 

is present in many copies and foeenuse its location and 
orientation in the virion are known. Note that cry 
uncertainty about the azimuth of the coat protein about 
its own alpha helical axis is unimportant. 

30 

The 3D model of fil indicates strongly that fusing 
BPTI to the amino terminus of H13 CP is more likely to 
yield a functional protein than any other fusion site, 
(See See* 1.3.3}, 

35 
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The amino-acid sequence of H13 pre-coat (SCmiB) , 
called A&jseql, is 



5 

.1 1 2 } | 2 3 3 4 4 S 

5 0 5 0 \/5 0 5 0 5 0 

M KSL7I hTLVm 0 PAKAAFHSLQi 

10 5 6 6 7 7 

5 0 5 0 3 
HVWXVGJ riGXKLFKFFTSKAS 



IS The single-letter codes for amino acids and the codes 
for ambiguous DWA are internationally recognised 
(GBOB87 } * The best site for inserting a novel protein 
domain into HI 3 CP is after A23 because SP-I cleaves 
the preeoat protein after 12 3, as indicated by the 

20 arrow. Proteins that can be secreted will appear 
connected to mature Ml 3 CP at its amino terminus. 
Because the amino tentiirms of mature Ml 3 CP is located 
on the outer surface of the virion, the introduced 
domain will be displayed on the outside of the virion. 

25 

BPTI is chosen as IPBQ of this example (See Sec, 
2,1} because it meets or exceeds all the criteria: it 
is a ecail ; very stable protein with a well known 3D 

3ve shown that a 

30 fusion of the oho A signal peptide gene fragment and DNA 
coding for the mature form of BPTJ caused native BPT1 
to appear in the periplasm of F cati, demonstrating 
that there is nothing in the structure of BP*!! to 
prevent its being secreted. 

35 

Marks at j^, (MARKS 7) also showed that the 
structure of B.PTI is stable even to the removal of one 
of the cystine bridges. They die this by replacing 
both C14 and C3 3 with either two alanines or two 
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t&reoninass , The C14/C3S cystine bridge that Harks et 
al > removed is the one very close to the scissile bond 
in BPT1 r surprisingly, both mutant molecules 
functioned as trypsin inhibitors. This indicates that 
S BPTX is redundantly stable and so is likely to fold 
into approximately the sane structure despite numerous 
surface mutations. Using the knowledge of homologues , 
yi.de infra, we can infer which residues avast not be 
varied if the basic BPTX structure is to be maintained. 

10 

The 3D structure of BPTX has been determined at 
high resolution by X-ray diffraction (HIJBE??, MAPQS3 > 
W10D8 4, WLODS7a, WLGDS7b) t neutron diffraction 
(WLOD84) , and by Km (WACJN87) * In one of the X-ray 

16 structures deposited in the Brookhavan Protein Data 
Bank, «6FTX« r there was no electron density for A58, 
indicating that ASS has no uniquely defined 
conformation. Thus we know tfeat the carboxy group does 
not make any essential interaction in the folded 

20' structure. The amino terminus of BPTX is very near to 
the earboxy terminus. fioldenberg and dreighton 
reported on circularised BPTX and circularly permuted 
BP" 1 ? I {golds 3} . Some proteins homologous to BPTI have 
more or fewer residues at either terminus. 

25 

BPTX has been called «toe hydrogen atom of protein 
folding" and has been the subject of numerous 
experimental and theoretical studies (STATS 7 , SCHW87, 
GOLDS 3, CHAZ83). 

30 

BPTX bus the added advantage that at least 32 
homologous proteins are known,, as shown in Table 13, A 
tally of ioniaahle groups is shown in Table 14 and the 
composite of amino acid types occurring at each residue 
35 is shown in Sable id, 
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BPTI is freely soluble and Is not known to toijxd 
metal ions, BPTI has no known emvamatio activity. 
BPTI binds to trypsin, 3% « 6.0 x 10™ 14 M (TSCHS7) . 
S BPTI is not toxic. If KlS of BPTI is changed to l> ( 
there is no measurable binding between the .mutant BPTI 
and trypsin (TSCH87) . 

Ail of the conserved residues are buried? of the 
10 seven fully conserved residues only G3 7 has noticeable 
exposure * The solvent accessibility of each residue is 
BPTI Is given in Table 16 which was calculated from the 
entry "SPTX" in the Brookhaven Protein Data. Bank with a 
solvent radius of 1.4 A • the atomic radii given in 
15 Table 7, and the method of Lea and Richards ( LEE 8? 1 } . 
Each of the 51 non-conserved residues can a ccoxamcda t e 
two or more nmdu O a carat i s 

substltutine; at each residue only those amino acids 
already observed at that residue,, we could obtain 
20 approximately 7 x 10 42 different amino acid sequences., 
ssost of which will fold into structures very similar to 
BPTI . 

BPTI will be useful as a IFBD for macroxaolecules. 
25 (See sec. 2,1*1}- BPTI and BPTI homologies bind tightly 
and with high specificity to a number of enzymes, 

BPTI is strongly positively charged except at very 
high pH, thus BPTI is useful as IFBD for targets that 

30 are not also strongly positive under the conditions of 
intended use (see Sec, 2,1x2) . There exist home! ogees 
of BPTI f however, having quite different charges ( vfe,. 
SCX-XXI from Bgmbyx mpyri at -7 and the trypsin 
inhibitor from bovine colostrum at -1} , Once a 

3S ler I iva MI s found that displays i on its 
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surface,, the secpumce of the BBTX domain can be 
replaced by one of the homologous sequences to produce 
acidic or neutral IPSDs, 

5 BPTI is not an enzyss (See sec. 2.1,3} ♦ BPTX is 

quite small? if this should cause a pharmacological 
problem., two or more BPTI-dsrived domains may be joined 
as in the human SPTJ honologne that has two domains, 

10 & derivative of B13 is the preferred OCT , (See 

Sac 3) , & "phagemid" is a hybrid between a phage and 
a plassaid, and is used in this invention. Double- 
stranded plasmid DMA isolated from phagenid -bearing 
cell; 

21 SI.-. 

15 pXX24. Phage prepared from these cells would be 
designated XY24. Fhagemids such as Blnescript K/S 
(sold by Stratagene) are not suitable for car purposes 
because Blnescript does not contain the full genome of 
H13 and must he rescued by coinfection with helper 

20 phage. Such connections could lead to genetic 
recombination yielding heterogeneous phage unsuitable 
for the purposes of the present invention. 

The bacteriophage Ml 3 hla 61 (ATCC 37033} is 
2 5 derived from wild-type MX 3 through the insertion of the 
beta lactamase gene (HIK180) . This phage contains 8,13 
kh of DBA, M13 Ma cat 1 (ATCC 31040} is derived from 
M13 tola 61 through the additional insertion of the 
ehloraraphonicei res: seance gene (HTMESO) j Mil fola cat 1 
30 contains 9 .,88 kb of DMA, Although neither of these 
variants of M13 contains the Colli origin of 
replication, either could be used as a starting point 
to construct a usable cloning vector for the present 
example. 



35 
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The QCV for the current example Is constructed toy 
a process illustrated in Figure 4, A brief description 
of all the plasmids and puagemids constructed for this 
Example is found in Table 17, 

5 

For ss ©Xigo-nt site-directed mutagenesis , 
multiple primers lead to higher efficiency, Three non~ 
rssfageuie primers are used; bases 2326-2352 of wt M13, 
Id-e 0 4854-48 5 > 03 5 ar<- the bases 

10 3431-3451 of pBx323„ Note that pDS2 and its 
derivatives carry the anti-sense strand of the amp^' 
gene in the * DK& strand. The segments are picked to 
he high in GC content and to divide the pie? genome 
into several segments o.f approximately equal length. 

15 

The genetic engineering procedures .needed to 
construct the OCV are standard., using commercially 
«va£la'fei-«. restriction enzymes under recommended 
conditions. All restriction fragments of DMA axe 

20 purified by electrophoresis or HPX»C» ' Ml 3 and its 
engineered derivatives are infected into £,„ cell strain 
PB384 (F*,ma~, Sup* * A»# s ); • tlasmid -oft* of M13 
derivatives is transformed into L coll strain FB3S3 (F~ 
, Rec~ , Stip 4 " , Amp s ) so that we avoid multiple rounds of 

25 infection in the culture, isolation of HI 3 phage is by 
the procedure of Saiivar et ad... {SALI64 } ? isolation of 
replicative form (RF) H.13 is by the procedure of 
Jaswinski et (*TAZW73a and JASW73b) . isolation of 

piasandds containing the CoiEl origin of replication is 

30 by the method of Maniatis (M&HX82) . 

We pick the MiS R 9'sne from pBE322 as a convenient 
antibiotic resistance gene. Another resistance gene, 
kanamye , could )<o us >o- IT 
35 fragment of pBR322 is a conveniently obtained source of 
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any R and the Col El origin, 

H13BP18 (Ksxv England Bio'Labs) contains;: neither tat: 
II nor tec I sites. Therefore we insert an adaptor 
5- that, allows us to iruoxt the g 

of pBF.322 that carries the anp R gene and the CclEl 
origin of replication into a desirable place in 
•413nplS . MXSmplS contains a iacyVa. pronorsr and a lacjj 
gene that are not useful to the purposes of the present 
10 invention. By cutting M3 3np^3 with ya and >;o36l 
and discarding the ^c.vr- r \ 500 * its 
pairs, we eliminate all recognition sites of several 
eazymes useful for engineering the bpti-geos Viu gene, 

IS The following adaptor is synthesized, 

( < . ~ 3 s oiie#i 

3< ' > -\' ,v3\ , ^ h\ ^ ' ; • ; J ,\\ 5 ; oiia« 

.1 Acoli Ksrl Xj. .iHea3«I 

20 

The annealed adaptor is ligated with PF M13npl8 
that has teen cut with both tell and BsuSSX and 
pari fled by PAGE or HFLC. Transformed cells are 
25 selected for plasmid uptake with ampiciliin. The 
resulting construct is called pL&i, 

DHA from pLGl is cut with both Aat II and Acc X, 
MtXT~to~Aecl fragment of pBR322 is ligated to the 
30 baeicbone of JJSX. The correct construct is na&ed pLGw. 

The Acq 1 restriction site is no longer needed for 
vector construction. To eliminate this site, RF pLG2 
dsDKA is cut with Ace I, treated with Klenow fragment 
33 and dATP and. dTTP to nahe it blunt and then religated. 
The cloning sector, named pLG3, is now ready for 
stepwise insertion of the osp-jpp.d gene. 
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We are now ready to design a gens (See Sec. 4) 
that will cause BPTX-dorains to appear on the outer 
surface of an M13 derivative: LG7 » 

5 

To obtain a novel protein domain attached to the 
outside of J3X3, ve insert DHA that codes for mature 
BPTI after A23 of the precoat protein of M1.3 ♦ Mature 
BPTI begins with an arginine residue, which is charged; 
10 cleavage by signal peptidase I is normal in such cases. 
Signal peptidase 1 £SP~I) cuts a chimera of HI 3 coat 
protein and BPTX after A23 leaving Stature BPTI attached 
at its carboxy end to the amino terminus of K13 CP, 

15 The following amino-acid sequence, called Ah_seq2- f 

is constructed, by inserting the sequence for nature 
BPTI (shown underscored} immediately after the signal 
sequence of HX3 precoat protein (indicated by the 
arrow) and before the sequence for the Ml 3 CP. 

20 

AA m seg2 

1 1 2 |J2 3 3 4 4 5 




10 11 11 12 12 13 
5 0 3 0 S 0 
3 5 E XXGYAWAMVVVIVGAT1GI KLFKKFTSKAS 

Sequence numbers of fusion proteins refer to the 
40 fusion, as coded, unless otherwise noted. Thus the 
alanine that begins Ml 3 CP is referred to as "number 
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82« ; i! nfflkr 1 of HI 3 CP'S or ''number 59 of the mature 
8PTI-MX3 CP fusion" . 

The osD-iobo g« u s regulated t\ the \ ggyg 
5 promoter and terminated by the trpA transcription 
terminator. The host strain of IL sell harbors the 
lacl^ gene* 'The osp-ip bd gene is expressed and 
processed in parallel with the »±!d~type gene VIII . 
novel protein, that consists of BPfl tethered to a 
10 M13 CP domain, constitutes only a fraction of the coat. 
Affinity separation is able to separate phage carrying 
only five or six copies 'of a molecule that has high 
affinity for an affinity matrix (SMIT8S) ; 2% 
incorporation of the chimeric protein results in about 
15 30 copies of the protein exposed on the surface. If 
this is insufficient, additional copies may be provided 
by, for example,, increasing IPTG. 

A model comprising K13 coat, after the model for 
20 fl of Marvin and colleagues (EAHP81) f and a EPS! 
domain, taken from the Broohhaven Protein Data Bank 
entry "6PTI !! , was constructed by standard model 
building methods that insure that covalent bond lengths 
and angles are close ro accept a 1 

25 shows that the fusion protein could fit into the 
supreme-}, scalar structure in a stereochemically 
acceptable fashion without disturbing the internal 
structure of either the M!3 CP or BFTX domain, 

3 0 -The ambiguous DN a sequence coding for Aa. seq2, is 

examined by a computer program for places where 
recognition sites for restrict ion sesames could be 
created without altering the amino-acid sequence « (See 
Sec. 4.3} . A master tafois of ensymes is compiled from 

35 "in, cartr- . e~ sie suppli^ 
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do not cut the: OC : ,h (Preferably constructed as 
described above) , 



Using the procedure given in Sec. 4,3, ws design a 
5 ipM .gene, such as that shown in Table 25. Some 
restriction enzymes o„» Ban X or Hob I) cut the OCV 
too often to be of value. 

The entire DSfA sequence of t: ^ v , ^ : . >. * 
10 with annotation appears in Table 2 5 showing the useful 
restriction sites and biologically important features., 
viz.,. the XaoWS promoter, the lacO operator, the Shine- 
Dalgarno sequence, the amino acid sequence, the stop 
codons, and the transcriptional terminator, 

15 

The ipfcd gene is synthesized in several steps 
using the method described in Sec. 5*1, generating 
dsDNA fragments of ISO to 190 base pairs. 

20: The four steps (See Sec. 6*1.) by which we clone 

synthetic fragments of the satcn^opti. gene {the oss- 
ipM gene of the present example) into pLG3 and its 
derivatives are illustrated in Figure S, 

25 The sequence to be introduced into pLG3 comprises 

a) the segment from EsurXI to Avrll (Table 20) , b) a 
spacer sequence (gecgctcc) , and c) the segment from 
AsuXX to Saul. The segment is 158 bases long and is 
synthesized from two shorter synthetic oligo-nts as 

30 described in Sec. 5.1 of the generic specification. 

Table 27 shows the ant. i sense strand of the 
sequen< * to he inserted. ?he 99 pa - fragment shown in 
n p per c a s e I sttsrs a n d u n d e r s c o r e d (5 ! - 
35 eeOTCCh . , .CC'rrcG-3 ! - dig™ 3} is synthesized in the 



WO 90/02809 



r/u< 



136 

standard isannsr, ~ i the 100 base. long fragment 

egotca > , » * aattg~-3 diig#4> is s'ysrt&esijsed,. After 
annealing, the &oub le-st r a nded region is extended with 
5 K'ienow fragment by the procedure given above to rake 
the entire l?6 bases double stranded. The overlap 
region is 23 base pairs long and contains 14 CG pairs 
and S AT pairs. The DNA betnae s v 

not code for anything in the final pfed gene; it is 

10 there so that the DNA can be cut by both Ayr II and 
Mull at the same time in the next step. light bases 
have been added to the left of Bs^XX and nine bases 
have been added to the left of dajil (same specificity 
and cutting pattern as Bsu36I) . These bases at the 

IS ends are not part of the final product? they mast foe 
present so that the restriction enzymes can bind and: 
cut the synthetic DHA to produce specific sticky ends. 

The synthetic BSA is. cut with both SajiX and n r. t 
20 and is ligated to similarly out dsOHA of pLG3. The 
construct with the correct insert is called pL&4. 

The second step of the construction of the OCV is 
illustrated in Table 28. As in the construction of 

25 pLG4, two pieces of single -stranded DHA are 
synthesized: a 09 base long fragment of the. anti-sense 
strand ending with p25 and a 99 base long fragment 
(starting with pXS) » Both the synthetic dsDMA and dsRF 
pLG4 DHA are cot with hot! iyrll »n As all and are 

30 ligated and used to transform E,. coli . The construct 
carrying this second insert is called pLGS » 

Construction of phss proceeds similarly to the 
construction of pLGS. The sequence, is shown in Table 
35 30. The two single stranded segments (one from the 
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ant i -sense strand ending with H§£> arid the other from 
the sense strand starting with the third base of the 
codon for Y58) are synthesized, annealed, and extended 
with Klenow fragment. Both the synthetic DHA and RF 
5 pLG5 are cut with both BssEl and Asull, purified, and 
the appropriate pieces are ligated and used to 
transform It. cpli. 

The construction of phG? is illustrated in Table 
10 32 and proceeds similarly to the constructions of pLG4, 
plfxS, and pLdS. The two single stranded segments (one 
from the anti-sense strand ending with the first base 
of the codon for VI 10 and the other beginning with 
£101} are synthesized, annealed, and extended with 
13 Klenow fragment. Both the synthetic DNA and RF pLGS 
re cot •« d , a.$ the 

appropriate pieces are ligated and used to transform It, 
,eoli « The construct with the correct fourth insert is 
called pl*G7? the display of BPfi: on the outer surface 
20 of XjO? is verified foy the methods of Sec, 8. 

Mi'3-si429 is an amber mutation of K13 used to 
reduce nonspecific binding by the affinity matrix for 
phages derived from M13, M13sm429 is derived foy 
25 standard genetic methods (MXLL72) from wtM13 . 

Phage LS7 is grown on Jh. ooli strain FE384 in LB 
broth with various concentrations of XFTG added to the 
medium to induce the oso~iohd gene. Phage LG7 is 
3 0 obtained from cells grown with 0,0, 0,1, 1.0, 10.0 or 
100,0 uM, or 1.0 mM XVTG-, harvested (See Sec. 7} by the 
Method of Salivar (SALI64) , and concentrated to obtain 
a titre of 1.0. i2 pfu/ml by the method of Messing 
(MESS S3 ) , 

3 5 
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The preferred method of determining -whether LG7 
displays BPTI on its surface (Sea Sec. B) is to 
determine whether these phage can retain a labeled 
derivative of trypsin tre ca Irotryt 
S on a filter that allows passage of unbound trp or 
:&HTrp . Tryps s conta osi s n \ can 

be iodinated with 12 5 l by standard methods; we denote 
the labeled trypsin as ! trr^ M! . Labeled anhydrotrypsin 
is denoted as "AHTrp*". Other types of labels can be 

10 used on trp or MfFrp ? e.g. blot in or a fluorescent 
label. AHTrp* or trp* is labeled to an activity of 0,3 
uCl/ug. A sample of 10 ia &37fl0 mM XPTG) is mixed with 
1.0 ug of trp* or AHTrp* in 1,0 ml of a buffer of 10 ssM 
KC1 , adjusted to pH S,0 with 1 mM K 2 EP0 4 / KH 2 P0 4 . The 

IS mixture is passed through an A^icgn MBPX system f itted 
with a membrane filter that allows passage of proteins 
smaller that H r ~ 3 00 , 000. Filters are soaked in 
buffer containing trp or AKTrp prior to the analysis. 
The filter is washed twice with 0.5 ml of buffer 

.20 containing trp or &HTrp* The radioactivity retained on 
the filter is quant itafced "with a scintillation counter 
or other suitable device. if each virion displays one 
copy of BPTI f then .05 ug of protein can be bound that 
would give rise to 3 x Id 4 disintegrations / minute on 

2S the filter- 

An alternative way to quant itate display of BPTI 
on the surface of LG7 is to use the stoichiometric 
binding between trypsin and BPTI to titrate the BPTI > 

30 A solution that titers 10 12 pfu/ml of a phage is 
approximately 1.6 * IQ-? M in phage if each virion is 
infective. The ratio of pfu to total phage can be 
determined spectrophotoiKetri.cal ly using the molar 
extinction coefficients at 260 nm and 2 80 nm corrected 

35 for the increased length of IM7 as compared to wtM13 . 
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For example, if a 1.0 ml solution that, contains 10 :I2 
pfu of LG7 phage grown with 1>0 mM IPTG inhibits 
trypsin solutions up to 4.8 x 10~ 7 M, we calculate that 
there are approximately 300 BPTJs/QF (X*j^ (4*8- x 10"'' 
5 molecules of BPTI/1)/(1.6 x 10~ 9 phage/1)). Inhibition 
of a specified concentration of trypsin is most easily 
measured spectrophotometries XI y using a peptide-Iinked 
dye, such as H alpha ~henaoyl~Arg~Ha- ■ (TSCHS7) .. 

10 Alternatively, binding to an affinity column may 

be used, to demonstrate the presence of BPXX on the 
surface of phage LG7. An affinity column of 2,0 ml 
total volume having BioRaS .&ffi~Cei XoC™) matrix and 
30 asg of MITrp ass affinity materia X is prepared by the 

IS method of BloRad, The void voXume f%) this column 
is, by hypothesis, 1,0 ml. This affinity column is 
denoted {AHTrp} , 

A sample of XO 12 is applied to {AHTrpj in 

20 1.0 mi of 10 mH KCi buffered to pH 8.0 with KH 2 P©$ / 
K2HPO4 - The column is then washed with the same buffer 
until the optical density at 28 0 nm of. the effluent 
returns to base line or 4 x V ¥ have been passed through 
the column, whichever comes first. Samples of LG? or 

2S 3&SXQ are then applied to the blocked ( AHTrp } column at 
XO 12 pfu/mi in 1.0 ml of the same buffer. The column 
is then washed again with the same buffer until the 
optical density at 280 xm of the effluent returns to 
base line or 4 x V v have been passed through, whichever 

30 comes; first. Following this wash, a gradient of KCI 
from X0 mM to 2 h in 3 x Vy, buffered to pH 0.0 with 
phosphate is passed over the column.. The first KCI 
gradient is followed by a KCI gradient running from 2 M 
to 5 H in 3 x . The second K gradient is followed 

35 by a gradient of guanidinium CI from 0.0 M to 2.0 M in 
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2 x V v in 5 M KOI and buffered to pH 8.0 with 
phosphate* Fractions of 50 ul are collected and 
assayed for phage by plating 4 ui of each fraction at 
suitable dilutions on sensitive cells. Retention of 
5 phage on the column is indicated by appearance of LG7 
phage in fractions that elute significantly later iron 
the column than control phage L&is or wtH13. A 
successful isolate of l&l that displays BPTI is 
identified, the insert and junctions are 

10 sequenced,, and this isolate is used for further work, 
described below. 

If vgDfJA is used to obtain a functional fusion 
between a BPTI mutant and M13 CP (vide infra) , then dna 

IS from a clonal isolate is sequenced in the regions that 
were variegated. Then gratuitous restriction sites for 
useful restriction ensye-es arc removed if possible by 
silent codon changes. The sequence numbers of residues 
in OSP-IPBX) will be changed by any insertions; 

20 hereinafter, we will, however, denote residues inserted 
after residue 23 as 23a, 23b , etc. Insertions after 
residue SI will be denoted as 8 la, 81b, etc. This 
preserves the numbering of residues between 05 ana CSS 
of BPTI. Residue C5 of BPTI is always denoted as 28 in 

25 the fusion; residue CSS of BFTX is always denoted as 78 
in the fusion, and the intervening residues nave 
constant numbers . 

Should LG7 phage from ceils grown with 10 rM I PTC 
30 fail to display BPTI on its surface, w& have several 
options. tfe might try to determine why the 
construction failed to work as expected. There are 
various possible modes of failure, including ; a) BPTI 
is not cleaved from the M1.3 signal sequence, b) BPTI is 
35 cleaved from the 1513 CP, and c) the chimeric protein is 



WO 98/62809 



PCT/OSS9/03731 



141 

made and cleaved after the signal sequence, but the 
process-eel protein is not incorporated into the M13 
coat, SFTX has fceen secreted from .1^ coll (MARKS 6) 
however the H13 coat -protein signal sequence was not 
5 used, therefore problems stemming from the signal 
sequence are unlikely, hut possible. Wo could 
determine whether 8PTI was present in the periplasm or 
bound to the inner membrane of LG7 -infected ceils by 
assays using try* or Antry* < 

10 

Proteins in the periplasm can be freed through 
sphereplast formation using lysozyme and EDTA in a 
concentrated sucrose solution (BIRDS? , M&IA64) . If 
SFTX were free in the periplasm, it would he found in 

15 the supernatant < Try* would he mixed with supernatant 
and passed over a non-denaturing molecular sizing 
column and the radioactive fractions collected, She 
radioactive fractions would then be analysed by 8:DS~ 
PASS * and examined for BPTI-sized bands by silver 

20 staining, 

Spheroplast formation exposes proteins anchored in 
the inner membrane, Spheroplasts are mixed with AHTrp* 
and then either filtered or centrifuged to separate 
25 them from unbound AHTrp*, After washing with 
hypertonic buffer, the spheroplasts are analyzed for 
extent of AHTrp* binding alternatively , membrane 
proteins are analyzed by western blot analysis, 

3 -o If SFTX is found free in the periplasm, then we 

would expect that the chimeric protein was being 
cleaned both between BPTX and the Ml 3 mature coat 
sequence and between BPTX and the signal sequence « In 
that case, we should alter the BPTT/KX3 CP junction by 

15 inserting vgDKA at codons for residues ?8-S2 of 
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If BPTX is found attached to the inner membrane, 
than 13xer® are two likely explanations. The first is 
5 that the chimeric protein is tossing cut after the signal, 
sequence, but is net being incorporated into LG? 
virion; the treatment would also be to insert. VgDS& 
between residues 78 end 82 of AAjaaq2. The alternative 
hypothesis is that BFfX could fold and react with 

10 trypsin even if signal sequence is not cleaved. M~ 
terminal amino acid sequencing of trypsin-binding 
material isolated from cell homogenate determines what 
processing is occurring* If signal sequence were toeing 
cleaved, we would use the procedure above to vary 

IS residues between C7S and M2; subsequent passes would 
add residues after residue 81, If signal sequence vers 
net being cleaved, we would vary residues between 23 
and 2? of *Ajhk&. Subsequent passes through that 
process would add residues after 23> 

20 

If BPTI were found neither in the periplasts nor on 
the inner nenbrane, then we would expect that the fault- 
Was in the signal sequence or the signal ~sequence~tc« 
BSTI junction. The treatment in this case would be to 
25 vary residues between 23 and 27. 

Several experiments that introduce variegation, 
into the fopti-cene SIS fusion are possible, including? 

30 1) 3 variegated codons between residues 78 and 82 

using oiig#12 and olig#13., 

2) 3 variegated codons between residues 23 and 27 
using olig#14 and oligflS, 
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3} 5 variegated codons between residues 78 and 82 
using o.Ug#i3 « oligfX3a ? 

4} 5 variagated oo&one Mtween residues 23 and 27 
3 using olig#15 and olig#l4a, 

5) 7 variegated co&ons between residues 78 and 82 
using olig#13 and oligllGb, and 

10 6} 7 variegated eo&OB* between residues 23 and 27 

using oiig#I3 and olig#i4b, 

To alter the BPTX-M13 CP junction,, we introduce 
DHA variegated at codons for residues between 78 and 83 

15 into the Ssa X and Sfl I sites ei pI»G7. The residues 
after the last cysteine are highly variable pa amino 
acid seguences homologous to BFTI, both in composition 
and length; in Table 25 these residues are denoted as 
Q7% mo t and &Bt* The first part of the M13 CP is 

20 denoted as A82, and 8$4, One of the oligo-nts 

olig#12, oXig#12a f or olig#X2b and the primer oXig#13 
are synthesized by standard methods. The oligo-nts 
are j 



resxdtte 
go jgagjcGC 



GCC ! AAA j GCG 



residua 75 76 77 78 79 80 SI 81a 81b 
gc I gag ] cGC | ATG \ CGT j ACC i P "T I ~~< 1 qf'k \ qxk j gf x \ qxA 



82 83 84 35 86 
OCT ! GAA j CGT 1 GAT j GAT I 



88 89 90 91 
GCC i AAA j GCG j GCC j gcg j cc 3 f oiigU3a 
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residua 75 76 77 78 79 SO 81 31a Sib 
S' <JO ! gag | cGC i ATG I CST j ACC 1 TGC j qf k j qf k j qfk j qr.k | q.f k j - 

88 80 90 91 

1.0 

residue 91 00 80 §8 S7 36 
5 ' gg | cgc j GGC j CGC j CTT | GGC f CGG f ATC 3 s olig#X3 

where g is a mixture of (0,36 T, 0.X8C, 0.26 A, and 
G.3G G) , f is a mixture, of {0,23 T.» 0.16 C, 0.40 A, and 
0 .22 G) ,< and 3c is a mixture of equal parts of T and G. 

20 The bases shown in lower case at either end ars spacers 
and are not incorporated into the cloned gene, The 
primer is complementary-'- to th© 3 * m& of each of the 
longer oligo-nts. one of the variegated oligo-nts and 
the primer o!ig#I3 are combined in equimoladt amounts 

25 and annealed. The deDNA is completed with all four 
(nt)TPs and Klenow fragment. The resulting dsDNA and 
W pLG? are cut with tooth Sfi I and Sigh I, purified, 
silked, and ligated. tEhis ligation fixture goes through 
the process described in Sec. IS in which we select a 

30 transformed clone that, when induced with Z&SQ, binds 
ABTrp, 

To vary the junction between MX 3 signal sequence 
and BFTlf we introduce DMA variegated at codons for 

35. residues between 23: and 27 into the gen I and X 
sites of pX»67« the first three residues are highly 
variable in amino acid -sequences homologous to BpTI. 
Homologous, sequences also vary in length at the amino 
terminus. One of the oligo-nts olig#14, ©lig#14a, or 

40 olig#14h and the primer oixg#15 are synthesized by 
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standard methods. The olxgo-nts are? 

r*a£dO» J 1? IS W SO 21 22 23 24 23 

5 5 s gjgccj qcQ j GTA j CCS j &TG j CTG j TCT j TTT j GC? j qf X | qfk f ~ 

26 2? 28 29 30 

l±> L'TC IGI C'l e;c -\ : ecg ega 3 cl.ig#14 

10 

residue 1? 18 19 20 21 22 23 24 25 36 
1 - ~c > gcG | STA ] CCG j ATS j CTG j TCT j TTT | GCT j q \ j qth j q£ k j ~ 

IS 26a 26b 2? 28 25 30 

gfk pfkjTTCkH rc Sjcgc ;c :ga 3' Tiig#14a J 

20 residua 17 IS 19 20 21 22 23 24 25 26 

Cm ^ - 2 v 1 v ; - ~ < „ ~ 

26a 26b 26C 26d 27 26 29 30 

| < \ \ erf k \ af k I oik ! TTC ! TCT i GTG j GAG \ cac j ccq \ oga 1 3 ! oiig#14b 

25 

S' | teg | egg | gcg j CTC | GAG I ACA j GAA [ 3 s olig#15 

30 where q is a mixture of (0.26 T f 0,18 C, 0,26 A, and 
0,30 <3) , f is a ssixture of {0.22 3?, 0.16 C> 0* 40 A, and 
0,22 6} , and k is a mixture of equal parts of T and G. 
Tha bases shown in lover case at either and are 
spacers ♦ One of the variegated oiigo-nts and the 

35 primer are combined in eguimolar amounts and annealed. 
The ds DKA is completed with all four (nt)TPs and 
Klenow fragment, The resulting dsBNA and »S* pm? are 
cut with both Kpn I and Xho X, purified, aired, and 
ligated. This ligation mixture goes through the 

40 process described in Sec, 15 in which we select a 
transformed clone that, whan induced with i:PTS f hinds 
AHTrp or trp. 
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If non& of t&ese approach.es produces a -working 
cfeimeric protein, we may try a different signal 
sequence, or a different OSP in H13 the gene. Ill 

protein for which there is fusion data (SM1T85, 
5 1)}, or another ge~ 



BPTI binds very tightly to trypsin 
10 (% « 6,0 X' XCT 14 H) and to anhydro trypsin , so that 
these molecules are not preferred for optimising the 
amount of BPTX to display on LG7 or the amount of 
affinity molecule to attach to the column. Tsehesche 
at al. reported on the binding of several SPTX 
15 derivatives to various proteases; 

Dissociation constants for BPTX derivatives, HoXar » 
Residue Trypsin Chynotrypsin Sl&stase llastase 
#15 (bovine (bovine (porcine (human 
20 pancreas) pancreas) pancreas) leukocytes) 

lysine 6.0 x nr M 9,0 x 10~ 9 - 3*5 x 10~ 6 

glycine •» * 7,0 x IS** 9 

alanine * - 2.8 x 10~" 8 2.5 x IQ" 9 

valine - - 5,7 x 1Q~ 8 1.1 xXG"* ls 

25 leucine - - 1.9 % IS" 8 2.9 x X0~ s 

From the report of Tsehesche g& we infer that 

molecular pairs marked «f* have K d s greater than 
3.5 x 10^ 6 8 and that molecular pairs marked !5 -~« have 
30 Kgs much greater than 3.5 x 10 ~ 6 H. Because of the 
wealth of data about the binding of BFTI and various 
mutants to trypsin and other proteases (TSCHS7) , we can 
proceed in various ways. (For other PBDs we can obtain 
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two different monoclonal antibodies,, ©he with a high 
affinity having E d of order 10~ X1 M, and one with a 
moderate affinity having % on the order of iCT* M» ) 
In this example, ws may «sb; a) the moderate binding 
5 between BPTI and human leukocyte elastase (HnlSi) , fo) 
the moderately strong hinding of porcine elastase to 
BPTI (VIS) , or c) the binding of SPTI(AIS) (residue 38 
in the obd gene) for trypsin (weak but detectable) or 
for porcine pancreatic elastase. 

10 

We compare the retention of LG7 virions to the 
retention of wild-type M13 on fMITrp} > Hi 3 derivative 
having Bore DBA than wild-type Ml 3 have corr spending 
longer virions. Thus we will create pLGS that differs 

IS f rom pLG7 only in having stop codone at codons 2 and 
3 , and an altered h codon at eodon 7 of the os^ip^bd 
gene. Phage LGS will have exactly as iush ,B» as LG7? 
therefore the LGS virion is exactly as long as the LQ7 
virion* LG8 can not., however, display 8PTX on. its 

20 surface , 

To expedite identification of different 
derived phage, we replace the amo H gene of LGS with the 
tyfc R gen® from pBB322 by standard methods. The B„S!I- 
25 to-Aatu tet R bearing fragment of pBRZZZ is ligated 
into DHa from pLSS cut with IM1 and AatlX, The 
correct construction., having 9-2 Kb, is easily 
distinguished from pBR322 and is called WlQ* 

30 The phage LG7 is grown at various levels of IFTG 

in the medium and harvested in the way previously 
described. An affinity column having bed volume of 2,0 
si and supporting an amount of HuLSi picked from the 
range 0,1 mg to 3 0.0 mg on 1 ml of BioRad M ! ii~ 

35 Gel 10''™* or Affi-Sel 15^ TH > is designated {HuLBX}. 
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An appropriate, set of densities of HuLEl on the column 
is (0.1 mg/mi s 0,5 mg/mi f 2,0 sig/ml , 8,0 mgyml f 15,0 
mg/ml,. atid 30-0 mg/ml} , The Yy of {Huh!!} is, by 
hypothesis, 1.0 ml , The elation of LS7 phage is 
5 compared to the eiufcion of LG10 on (HuLSl) having 
varying amounts of HuHEi affixed. The columns are 
elutad in a standard ways 

1) 10 mH KC1 buffered to pU B*Q with phosphate, 
10 until optical density at 2 80ms fells to base line 

or 4 x whichever is first, 

3} a gradient of 10 si to 2 H KC1 in 3 x Vy, pH 
held at 8*0 with phosphate, 

15 

3} a gradient of 2 X to S M KC1 in 3 x Yy f 
phosphate buffer to pH 8.0> 

4} constant 5 St KCl plus 0 to 0,8 M gu&nidinium CI 
30 in 2 x Y Vf with phosphate buffer to pH 8,0, 

The preferred level of Induction (IPTGoptimal) and 
amount of affinity molecule on the matrix 
CDoSJSoH optimal ) are those settings that give the 

as sharpest LS7 elution peak that shows significant 
retardation as compared to U&S, which carries no BPTI, 
By hypothesis, the best separation occurs for the 
amount of BPTX/GP produced when the ceils are induced 
with 10,0 uM 2&$®. and when 4,0 mg Bu&BI/ml is applied 

3 0 to BioRad Af fi-Gal lo(™> , 

When the amount: of BPTI/GP and the amount of 
HuLEl/ volume of support have bean optimised, ?v v e turn to 
optimisation of elution rate, initial ionic strength, 
35 and the amount of CP/ (volume of support), These 



WO 90/02809 



145 

parameters dan be optimised separately. 

using optimal mrx/m? m& Wl/volw of support, 
we measure the elution vomm of 1=67 and LG8 for 
5 5 i~<-~-> r^.v:):^ «. , 1 2 f > 4 1 '? ,ro 

times the maximum flow rate, By hypothesis, 1/4 of 
maximum elation rata is batter than. 1/2 f but 1/3 is 
about the same as 1/4, Therefore 1/4 maximum elution 
rate will be used. 

10 

Elution volumes of JJ37 obtained from sells grown 
on media that is 2.0 m is ITO are measured at optical 
DohMoM and elution rate for loadings of XO 3 ,. ID 10 ,, 
X0 11 , and 1Q 12 pfu. By hypothesis, 1Q 12 pfu of pure 

IS IM7 overloads the col asm and significant number of 
phage elute before their characteristic position in the 
KC1 gradient. We also find that 10 ;I:l pfu overloads the 
column only slightly, and that 10 10 pfu does not 
overload the column. Because the use of the affinity 

20 separation in Sec. 15 will involve a population in 
which no single member is more than one part in 1Q 4 , we 
conclude that io 12 pfu of a variegated population could 
foe applied to a column of 1.0 mi matrix volume without 
overloading with respect any one species. The 

as overloading of a 1.0 mi column by IO 12 pfu also 
indicates that the initial column that captures 
indiscriminately adhesive phage should be S to 10 times 
as large as the column that supports the target 
material . 

30 

Elution voimaes of hG? and LG10 obtained from 
cells grown on media that is 2.0 mM in XPTS are 
measured at optimal conditions and for a loading of 
10 1D pfu for various initial ionic strengths: .1.0 jrM, 
33 5,0 * 10.0 m t 20,0 m, and SO.O m. We may find, 
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for example, that LG10 is slightly retarded by the 
column when loaded at 1.0 ssM KCl, but that is? always 
comes off the column at its characteristic place in the 
gradient. We use 10.0 WL as initial ionic strength in 
5 all remaining affinity separations. 

To determine the sensitivity of chromatography of 
phage that display variants of BPTI on their surfaces 
(Sec, 10*1} f we prepare artificial mixtures of two 

IS closely-related phage that differ only at one residue 
in the BPTI domain, One variety of phage has strong 
affinity for the column used in this step, while the 
other phage has no affinity for the column. We 
chroma tograph these mixtures to discover how little of 

IS the phage that hinds to the column can be detected 
within a large majority of phage that so not hind the 
eolmsru 

For these tests we choose AHTrp as Afh(BPTX) . A 
20 column having 2 mi bed volume is prepared with 
(DcM!oM optis - al mg of ABTrp}/(mi of Affi~0al 10^)). 
Thm column is called (AHTrp) and has v v « 1.0 ml. 

A new phage, LG9 , is prepared that displays 
25 BPTI (¥15) as IFBD in contrast to LG7 that displays 
BFTT(RXS f wild-type} as 1PBD, Residue 15 of BPTI is 
residue 38 of the gs p~lnhd gene. We introduce the 
change K38 to V by replacement of a short segment of 
the oso-iobd gene between A pa. X & 8&u I. The correct 
30 construction is called pLSS . To expedite 
differentiation between LG? and an LOS -derivative 
phage, we replace the ajnp^ gene of 1.-09 with the fet R 
gene frcss. pBR:322 fr 

blunted) and AatXI (1428) is ligated to dsOMA from pLG9 
35 cut with hbal (blunted) and As til , The correct 
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construction, having 3,2 3cfo, is easily distinguished 
from pBR322 and is called k<211> Dm from phage L&ll is 
sequenced in the vicinity: the junctions of the newly 
inserted tet R gene to confirm the construction. 

t&i and LGli are grown with optimum XFTG (2,0 mM) 
and harvested, Mixtures are prepared in the ratios 



LG7 ; LGX.1 5 J l?Vj ^ 

where V lim ranges from ID 10 to 10 s by factors of 10. 
Large values of v^im 3xe tested first? once a xs 
found that allows recovery of W? , smaller values of 
%im ars not ba tested. 

The caiman { ABTrp } is first blocked by treatment 
With 10 1X virions of M13a;n429 in 100 ul of 10 mM KCl 
buffered to pH S.O with phosphate? the column is washed 
with the same buffer until OD 2S0 returns to mm line 
or 4 x V v have passed through the column , whichever 
comes first. One of the fixtures of LG7 and tGXl 
containing 10 12 pfu in 1 ral of the same buffer is 
applied to (AHSCrp} . The column is elntsd in a standard 
way : 

X) 10 m KCl huff ©red to pH 8.0 with phosphate , 
until optical density at 2 8 (torn fails to base line 
or 4 x whichever is first, (discard effluent), 

2} a gradient of 10 mM to a M KCl in 3 X V Vl pH 
held at 8.0 Witti phosphate, (30 x 100 ul 
fractions) , 

3} a gradient ox 2 M to 5 M KCl in 3 x 
phosphate buffer to pH 3,0, (30 x 100 ul 
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fractions) , 

4) constant 5 M KCl plus 0 to 0,8 H guanidinium cl 
in 2 x Vy, with phosphate buffer to pH e.O f (20 x 

5 100 ul fractions) , 

5) constant; 5 I sc.1 pk.s o,s M 9 5 cl in 
1,2 x V V/ vith phosphate buffer to pB S.S, (12 x 
100 ul fractions). 

10 

Samples of 4 ul from each fraction are plated at 
suitable dilution on phage-sensitive sup* cells (so 
that K13am429 will not grow) . A sample of the column 
IS matrix is also used as inoculum for phage- s a as it ive 
SUp* cells. Plagues are transferred to ampiciilin- 
contain lug LB agar, and &mp R colonies are tested for 
display of BPTI(KIS) hf m& of trp* or MTrp** 

20 By hypothesis, l r xim 88 4 »° * 10 s is the largest 

value for which LG7 can be recovered. Thus C sens i ~ 
4.0 x 10 8 . Throe cycles of chromatography are required 
to isolate 1.-G7 so the first approximation to C sff is 
740 ( - axp{ log e {4,0 36 10 8 }/3 } }, 

23 

We now determine the efficiency of the affinity 
separation (Sec* 10*3), This is done by; a) preparing 
mixtures of LG7 and LG11 in the ratio x.:Q, h) enriching 
the population for LG7 for one separation cycle, and c) 
30 determining the fraction of LS7 in the last phage- 
bearing fraction. When Q is 1,5 x 10 4 ,. 3% of colonies 
are BFTX positive. When Q is 1.5 s 10 3 , 60% of the 
colonies are BFTX positive. Thus we calculate c ef £ ** 
.60 X 1.5 X XQ 3 «■ 900. 

35 
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Our hypotmt&&*l. :XB7 should display ®«* or ® or « 
BPTX domains on each virion. The psF~&Bfe& gene is 
under control of the iacPVS promoter so that expression 
levels of BF£IH0L3 CP can he manipulated via £1?TG j * 
5 This construct stay be used to develop many different 
binding proteins, all based on BPTI , An optimum level 
of induction and amount of - i 

2.0 mg/(ml of support)} should have, been determined? 
target molecules will be applied to columns in this; 
10 amount in the process disclosed in Sec. 15,1- These 
optimum levels may be adequate for ail targets and all 
variegations of BPTI displayed on derivatives of Ml 3 
based on LS?, hut some further optimisation may foe 
needed it other values of pH or temperatures are used. 

15 

Other ' gbd gene fragments may foe substituted for 
the bpti gene fragment in pLG? with a high livelihood 
that PBD will appear on the surface of the new LG7 
derivative , 

20 

MlUBPill.lL gart Ill 

HHMfe is chosen as a typical protein target; an 
other protein could he used. HHHb satisfies all of the 
25 criteria for a target; 1} it is large enough to he 
applied to an affinity matrix,, 2) after attachment it 
is not reactive,, and 3) after attachment there is 
sufficient unaltered surface to allow specific binding 
by PBDsf 

30 

The essential information for BHKfo is known; 1} 
mm is stable at least up to 7Q°C, between pS 4,4 and 
9.3, 2} HHHb is stable up to 1.6 H Cuanidinium CI, 3.) 
the pi of HHHb is 7.0, 4) for HHMb, H r * 16,000, 5} 
3S HHKfo requires fcaem, 6} Biffib has no proteolytic 
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activity „ 

In addition, the following information about HBMfo 
and -other myoglobins is available* 1} the sequence of 
S HHMb ,.. 2} the 3D structure of sperm whale myoglobin 
(HHMb has IS amino acid differences and it is generally 
assumed that the 3D structures are almost identical) , 
3} its lack of ensymatio activity, 4} its lack of 
toxicity . 

10 

We set the specifications of an SBD as t 
1} T m 25»C 
15 2} P H - e.o 

3) Acceptable solutes % 

A } for binding s 

i) phosphate, as buffer, 0 to 20 m r std 
20 ii) KCl f 10 n&, 

B ) for column alution s 

1} phosphate, as buffer, 0 to 30 m f 

ii) KCl, up to 5 h, and 

im ^assidiniun ci, ap to 0.8 K. 

as 

4) Acceptable % < 1.0 * 10"^ & 
We choose LG7 as SP{1PBD) . 

30 Residues to be varied are picked, in part, through 

the use of interactive computer graphics to visualise 
' the structures. In this section, all residue numbers 
refer to BPTX* We pick a set of residues that forms a 
surface such that all residues can contact one target 

35 molecule. Information relevant to choosing EPTI 
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residues to vary Includes; 1) the 3D structure, 3) 
solvent accessibility of each residue (LEEB71) , 3) a 
compilation of sequences of other proteins homologous 
to BPTX, and 4} knowledge of the structural nature of 
S different amino acid types . 

Tables 16 and 34 indicate which residues of BPTX; 
a} have substantial surface exposure, and h) are known 
to tolerate other amino acids in other closely related 

10 proteins. We use interactive computer graphics to pick 
sets of eight to twenty residues that are exposed and 
variable, and such that all members of one set can touch 
a molecule of the target material at one time. If BPTX 
has a small amino acid at a given residue, that amino 

IS acid may not he able to contact the target 
simultaneously with all the other residues in the 
interaction set, but a larger amino acid might well 
make contact, A charged amino acid might effect 
binding without making direct contact. In such cases, 

20 the residue should be included in the interaction set, 
with e notation that larger residues might be useful, 
In a similar way, large amino acids near the geometric 
center of the interaction, set may prevent residues on 
either side of the large central residue from making 

25 simultaneous contact. If a small amino acid, however, 
were substituted for the large amino acid, then the 
surface would become flatter and residues on either 
side could make simultaneous contact. Such a residue 
should he included in the interaction set with a 

3 0 notation that small amino acids may be useful. 

Table 35 was prepared from standard model parts 
and shows one aaximum span between Cu e ^ a and the tip of 
each type of side group- yfoeta ^ s used because it is 
3 5 rigidly attached to the protein main-chain? rotation 
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about the C a xs5ha~ c b®ta feon<i ^ tfe« most important 
degree of freedota for determining the locution of the 
side group. 

$ Table 34 indicates five surfaces that meet the 

gives criteria* The first surface comprises the set of 
residues that contacts trypsin in the complex of 
trypsin with BPTI as reported in the Brookhaven Protein 
Data Bank entry *XTS9>'. This eat la indicated by the 
10 number The exposed surface of the residues in 

this set (taken from Table 16} totals 1148 A 3 and the 
approximates tbs area of contact between BPTI and 
trypsin, 

IS Other surfaces, mustered 3 to 5,, wars picked by 

first picking one exposed, variable residue and then 
picking neighboring residues until a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique? other sets of 

20 variable, surface residues can be picked, Hereinafter 
we refer to K1S as being at the top of the molecule* 
while the carboxy and amino termini are at the bottom. 

Solvent accessibilities are useful, easily 
25 tabulated indicators of a residue's exposure. Solvent 
accessibilities must fee used with soxae caution; email 
amino acids arc under~r e I and large amino acids 

over~represented. The user must consider what the 
solvent accessibility of a different amino acid would 
30 be when substituted into the structure of BPTI * 

To create specific binding between a derivative of 
BPTX and HHMb, we will vary the residues in set #2, 
This set includes the twelve principal residues 17 (R) , 
35 19(1}, 21<Y) , 27(A) , 28(0), 29(1.}, 31 (Q), 32 {T}, 34 (V) , 



WO 90/82809 



PCI/US89/0373I 



157 

48 (A) , m0) t and SSP;} (Sec. 13*1*1} > Hone of the 
residues in set #2 is completely conserved in the 
sample of sequences reported in Table 34? thus w dan 
vary thesis with a high probability of retaining the 
S underlying structure. Independent substitution at each 
of these twelve residues of the amino acid types 
observed at that residue would, produce approximately 
4*4 x 10 9 amino acid sequences and the saws number of 
surfaces. 

10 

BFTI is a very basic protein , This property has 
been used in isolating and purifying SPT! and its 
homologies so that the high frequency of arginins and 
lysine residues may reflect bias in isolation and is 
15 not necessarily required by tbe structure. indeed, 
sex-Ill from Bomfcvx ®g£& contains seven more acidic 
•than basic groups (SASABi) . 

Residue 17 is highly variable and fully exposed 
20 and can contain H, K, A, H> F t L, K, T, G f Y, P, or 
S, All types of amino acids are seen; large, small, 
charged, neutral, and hydrophobic. That no acidic 
groups are observed may be due to bias in the sample, 

25 Residue 13 is also variable and fully exposed, 

containing P r R, X, S, K, Q, and L. 

Residue 21 is not very variable,, containing F or i 7 
in 31 of 33 cases and 1 and w in the remaining cases, 

30 The side group of Y21 fills tbe space between -T32 and 
tbe main chain of residues 47 and 48. The OH at the 
tip of the y side grotsp projects into the solvent. 
Clearly one can vary the surface by substituting y or F 
so that the surface is either hydrophobic or 

3 5 r/drophilie in that region. it is eisc pes ; s that 
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the otter aromatic amino acicl {yij„., H } or the other 
hydrophobics {L> M, or V} might be tolerated. 

Rosid.ua 27 most often contains A, but S, K, L f ate 
•5 T are also observed, On structural grounds, this 
residue will probably tolerate any hydrophilic amino 
acid and perhaps any amino acid. 

Residue 28 is s in BtfTC. This residue is in a 
10 burn, but is not in a conformation peculiar to glycine. 
Six other typos of amino acids nave been observed at 
this residue: K, K, Q, fi, H, and N> Small side groups 
at this residue might not contact BHKb simultaneously 
with residues 17 ate 34, barge side groups could 
15 interact with HHMfo at the same time as residues 1? and 
34. Charged side groups at this residue could affect 
binding of HHMb on the surface defined fay the other 
residues of the principal set, Any amino acid, except 
perhaps P, should tee tolerated. 

20 

Residue 29 is highly variable, most often 
containing L< This fully exposed position will 
probably tolerate almost any amino acid except, 
perhaps, P. 

25 

Residues 31, 32, and 34 are highly variable, 
exposed, and in extended conformations; any amino acid 
should he tolerated, 

30 Residues 48 and 49 are also bighly variable and 

fully exposed, any amino acid should bo tolerated. 

Residue 52 is in an alpha helix. Any amino acid, 
except perhaps P, might be tolerated. 



35 
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®m we consider possible variation of the 
secondary set (Sec. 13*1.2} of residues that- are in the 
neighborhood of principal set. neighboring 

residues that, slight be varied at later stages include 
5 > (P) f 11 <T), 15{K), 16(A), 18:CXl, 20 OR), 2.2. (?) , ?4 (K) > 
26 (K), , 47 (S), SO (D} , and S3(R). 

Residue 9 is highly variable, extended, and 
.exposed. Residue 9 and residues 48 and 49 are 
10 separated by a bulge caused by the ascending chain from 
residue 31 to 34. For residue 9 and residues 48 and 4 9 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain from 31 
to 34 can fit, or all three residues (9, 48, and 49} 
13 must have large amino acids that effectively reduce the 
radius of curvature of the BPtl derivative* 

Residue 11 is highly variable., extended, and 
exposed. Residue XX, like residue 9, is slightly far 
SO froxa the surface defined by the principal residues and 
will contribute to binding in the saaie circumstances. 

Residue .IS is highly varied. The side group of 
residue 15 points away form the face defined by set #2, 
25 Changes of charge at residue 15 could affect binding on 
the surface defined by residue set #2, 

Residue IS is varied but points away from the 
surface defined by the principal set. changes in 
30 charge at this residue could affect binding on the face 
defined by set #3. 

Residue IS is I in BPTI. This residue is in an 
extended conformation and is exposed. Five other amino 
35 acids have been observed at this residue; H., F, L, V s 
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and T» Only T is hydrophilic. The si.de group points 
directly away from, the surface defined .by residue set 
#2» Si.ibstitut.ion of charged amino acids at this 
residue could affect binding at surface defined by 
5 residua set #2. 

Residua 20 is R in BPT1* His residue is in an 
extended conformation and is exposed. Four other amino 
acids have been observed at this residue; A, S ( L f and 
10 a* She side group points directly away from the 
surface defined by residue set #2. Alteration of the 
charge at this residue could affect binding at surface 
defined by residue set #2. 

IS Residue 22 is only slightly varied., being Y f F, or 

H in 30 of 33 oases, nevertheless, A f U f and S have 
been observed at this residue. Amino acids such as b f 
M ? X, or Q could be tried here. Alterations at residue 
22 raay affect the mobility of residue 21? changes in 

20 charge at residue 22 could affect binding at the 
surface defined fey residue set #2 . 

Residue 24 shows some variation, but probably can 
not interact with one molecule of the target 
25 simultaneously with ell the residues in the principal, 
set. Variation in charge at this residue might have an 
effect on binding at the surface defined by the 
principal set. 

30 Residue 26 is highly varied and exposed, changes 

in charge may affect binding at the surface defined by 
residue set #2? substitutions may affect the mobility 
of residue 27 that is in the principal set. 

35 Residue 3 5 is most often 1, W has bean observed. 
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The side group of 35 is buried., but substitution of F 
or W could affect the mobility of residue 34 . 

Residue 4? is always T or S in the sequence sample 
5 used. The 0 aaima probably accepts a hydrogen bond from 
the Ml of residue 50 in. the alpha helix. Nevertheless; 
there is no overwhelming stario reason to preclude 
other amino acid types at this residue. In particular, 
other amino acids the side groups of which can accept 
10 hydrogen bonds, jEia* P* Q.t and E, nay be acceptable 
here. 

Residue SO is often an acidic amino acid, but 
other amino acids are possible, 

IS 

Residue S3 is often R, but other amine acids have 
been observed at this residue. Changes of charge may 
affect binding to the amino acids in interaction set 

20 

From published models (HUBB??, WL0DS4) one can see 
that E39 is on the opposite side of BPTX from the 
surface defined by the residues in set #2, Therefore, 
variation at residue 39 at the sane time as variation 
35 of some residues in set #2 is much less likely to 
improve binding that occurs along surface #2 than is 
variation of the other residues in set #2. 

In addition to the twelve principal residues and 
30 13 secondary residues, there are two other residues., 
30(C) and 33(F) , involved in surface #2 that wa will 
probably not vary, at least not until late in the 
procedure. These residues have their side groups 
buried inside BPTI and are conserved* Changing these 
33 residues does not change the surface nearly so much as 
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does changing residues in '.th* principal set. These 
buried , conserved residues do, however, contribirea to 
the surface area of surfsee #2* The surface of rssidus 
set #2 is comparable to the area of the tryps In-binding 
5 surface. Principal residues 3.7, 19, 21, 27, 29. 23, 
31, 32, 34, 4a, 49, and 52 have a co.mbi.ned solvent- 
accessible area of 946,3 $ 2 . Secondary residues 9, 11, 
IS, 16, IS f 20, 22, 24, 26, 35, 47, 50, and 53 have 
combined surface of 1041.7 12- Residues 30 and 33 have 
10 exposed surface totaling 38,2 I 2 . Thus the three 
groups* combined surface is 3030,3 f 2 . 

Residue 30 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, 
IS that C14/C3S is conserved in all natural sequences , yet 
Marts et al» (M&SK87) snowed that changing both C14 and 
C3S to A, A or T f T yields a functional trypsin 
inhibitor. Thus it is possible that BFfl-li&e 
molecules will fold if C30 is replaced. 

20 

Residue 33 is F in BPTI and in all homologous 
sequences. Visual, inspection of the BPTI structure 
suggests that substitution of t, M, H, or L might he 
tolerated. 

36 

Given our hypothetical affinity separation 
sensitivity, Csensi? v ® decide to vary six residues 
leaving some margin for errors in the actual base 
r 5 it ion of variegated bases. To obtain raxinal 

30 recognition, we choose residues from the principal set 
that are as far apart as possible. Table 36 shows the 
distances between the beta carbons of residues in the 
principal and peripheral set. HI? and ¥34 are at one 
end of the principal, surface. Residues A27, G28 f L29, 

35 MS, 143, and M52 are at the other end, about twenty 



PCT/US89/03731 



163 

Angstroms away; of these, we will vary residues 17, 37, 
29, 34, and 48, Residues 2S, 49, and 52 will be varied 
at later rounds* 

5 Of the remaining principal residues, 21 is left to 

later variations. Among residues 19, 31, and 32, we 
arbitrarily pick IS to vary. 

Unlimited variation of six residues produces 6.4 x 
10 10 7 amino acid sequences. By hypothesis, Csensi xs 1 
in 4 x 10 8 . Table 3? shows the programed variegation 
at the chosen residues. The parental sequence is 
present as 1 part in 5>S x W 7 > hut the least favored 
sequences are present at only 1 part in 4.2 x XO 9 * 
13 Among single~amino~acid substitutions from the PPED, 
the least favored is F17-X2S~A27~ri>9~l f 34~A4S and has a 
calculated abundance of l part in 1,6 x 10 s , Using the 
optimal qtk eodon,- we can recover the parental sequence 
end all one-amino-aeid substitutions to the PPBD if 
20 actual nt compositions come within 51 of programmed 
compositions. The number of trans farmants is M ntv ~ 
1*0 x 10 s (also by hypothesis) , thus we will produce 
fost the programmed sequences. 

25 The residue numbers above refer to mature BPTX. 

Since Table 25 refers to the pre-!fl3CP-BPTI protein, 
all mature BJTX sequence numbers have been increased by 
the length of the signal sequence, 23, Thus, we wish 
to vary residues 40, 42, SO, 52, 57, and 71, A DNA 
gn ncs containing all these codons is found 
between the (ApaX) sites at base 191 and the SphX site 
at base 3 09 of the qip-pbd gene. Among Anal, Oral!, 
and PssX. ApaX is preferred because it recognises six 
bases without any ambiguity and will cut fewer 

3 5 sequences in the vgDNA, Gratuitous restriction sites 
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can he avoided in mmm cases- fey use of cod an ambiguity; 
chancing the co&on- for <pl from mc to G<23? makes it 
impossible to generate an jfcjj&l site at codons 50, 31, 
and 6~§2. 

5 

Each piece of cksDKn to be synthesited needs six to 
eight bases added at either end to allow cutting with 
restriction enzymes and is shown in Table 37, The 
first synthetic base (before cutting with and 

10 SphT \ is 184 and the last is 322, There are 142 bases 
to be synthesis ad. The center of the piece to the 
synthesized lies between Q54 and ¥57, The. overlap can 
not include varied bases, so we choose bases 245 to 236 
as the overlap that is 12 bases long* Nate that the 

15 codon far P56 has been changed to TTC to increase the 
<SC content of the overlap. The amino acids that are 
being varied are marked as I with, a plus over them. 
Codoas 37 and 71 are synthesized on the sense (bottom) 
strand. The design calls for w q£k« in the antisense 

.23 strand,, so that the sense strand contains (from 5* to 
3 ! ) a) equal part C and A {LtM*. the complement of k) f 
b) (0,40 T, 0*22 A, 0.22 C, and 0.16 G) (I.e. the 
complement of f) , and cj (0.26 T, 0.26 A, 0.30 c, and 
0.18 G}.., 

25 

Each residue that is encoded by «q£fc» has 21 
possible outcomes, each of the amino acids pics stop* 
Table 12 gives the distribution of amine acids encoded 
by *q£k M , assuming 5% errors* The abundance of the 
30. parental sequence is the product of the abundances of R 
x 1 x A x L x V - x A* The abundance of the least- 
favored sequence is 1 is 4.2 % 10 s * 



01iq#27 and oiig#2B are annealed and extended with 
Klenow fragment and all four (nt>T.Fs< Both the ds 
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synthetic OKA and m pl=G7 Bll ass cut with both Apa I 
and .Stifa I. The est 'm& is purified: and the appropriate 
pieces Ugated (See Sec. 14.1} and used to transform 
competent PB383, (Sec. 14,2). In order to generate a 
5 sufficient number of transxornanns ve start with 5.0 1 
of cells, 

1} culture 1*. coll in 5.0 1 of LB broth at 37°C 
until cell density reaches 5 x 1C~ to ? x 10 7 
10 eel is/Ml., 

2) chili on ice fo.r 65 minutes, centrifuge the 
cell suspension at 4000g for 5 minutes at 4°C, 

15 3) discard supernatant? resuspend the cells in 

1667 ml of an ice-cold, sterile solution of 60 
mM Ca€X 2 , 

4} chill on ice for IS minutes* ami then 
20 centrifuge at -4000g for 5 minutes at 4°C, 

5} resuspend cells in 2 sc 400 ml of ice-cold, 
sterile SO mM CaCl 2 ? store cells at 4°C for 34 
hours ? 

25 

6) add 0K& (100 m) in 20 ml of litigation or 
TE buffer? mix, inoulafe on ice for minutes f 

7) distribute into 200 pi aliquots and boat 
30 shock cells at 42°C for 20 seconds, 

8> add 200 ml LB broth and incubate at 37°C for 
1 hour/- 

35 9} add the culture to 2.0 1 of B broth 
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eontair j In at. 3S-100 ug/tal and 

culture overnight at 3?°e f 



10} after 6 hours , rss-ove 200 ml and plane 0-5 
5 ml portions with log phase JM 10? on m agar,. 

using the so£t<~agar overlay technique. Phage 
are prepared from the soft agar, 

11) centrifuge the overnight culture to remove 
10 cells, and pellet phage (MESS83) , 

12) harvest virions toy method of Saliyar, &t 
el, (SAL1S4), 

10 

It is important tot a) use all or nearly all the 
vgDHA synthesized in ligation, to) use all or nearly ail. 
the Xigat ion mixture to , transform cells, and c) culture 
all or nearly all ttoe transformants. These measures 
20 are directed at maintaining diversity. 

It is important to collect virions in a way that 
samples all or nearly all the tr&asforxaants, Because 
F~ cells are used in the transformation, multiple 
25 infections do not pose a problem in the overnight phage 
production, F* cells are used for phage production in 
agar., 

HHMfo has a pi of 7,0 and we carry out 
30 chromatography at pH 8,0 so that HHHto is slightly 
negative while BFTX and most of its mutants are 
positive, HHMto is fixed (Sec, 15,1) to a 2,0 mi column 
on Affi-Gel 10 <™) or Af£i~Gal 15 (™) at 4.0 mg/ml 
support matrix the same density that is optimal for a 
3 5 column supporting r.rp < 
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To remove, variants of BFTI with strong,,- 
indiscriminate binding for any protein or for the 
support matrix (Sac. 15.2), we pass the variegated 
S population of virions over a column that supports 
bovine serum albumin (BSA) before loading the 
population onto the {HHKb} column. A£fi-Gel lo (TM} or 
affi-Gei 1S<™) is used to immobilise BSA at the 
highest level the matrix will support. A 10,0 ml 

10 column is loaded: with 5,0 ml of Af f i -Gel - 1 inked- BSA ; 
this column , called {BSA}, has V v * 5.0 ml. The 
variegated population of virions containing lO-'-* pra in 
1 (.2.2 X \\r) of 10 *K KCl t 1 rM phosphate, pH 8-0 
buffer is applied to {BSA}. We wash (BSA) with 4,5 ml 

IS (0,9 x V V } of 50 mM KCl , 1 nM phosphate, pH 8.0 cutter. 
The wash with SO mK salt will elute virions that adhere 
slightly to BSA but not virions with strong binding. 
The pooled effluent of the (BSA) column is 5.3 ml of 
approximately 13 m KCl. 

20 

The column {HBMfe} is first blocked by treatment 
with 10 11 virions of M13(am429) in 100 Ul of 10 mM KCl 
buffered to pH 8,0 with phosphate? the column is washed 
with the same buffer until OD 2 go returns to base line 
25 or 2 x V v have passed through the column, whichever 

comes first, The pooled effluent froa* {BSA} is added 
to {KHMb} in 5,5 ml of 13 mH KCl f 1 m phosphate, pH 
8,0 buffer. The column is ©luted (Sec, 15,3} in the 
following way: 

30 

1} 10 mM KCl buffered to pH 8.0 with phosphate, 
until optical density at 280nm falls to base line 
or 2 x % f whichever is first, (effluent 
discarded) ,< 
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2) a gradient of 10 mM to 2 H KCl in 3 x Ty, pH 
hell at 8.0 with phosphate, (30 x 100 
fractions) , 

3} a gradient of 2 H to 5 M KCl in 3 x Yy , 
phosphate buffer to pH S,0 (3D x 100 &\ 
fractions) , 

4) constant 5 M KCl plus 0 to 0,8 M goanidtnium CI 
in 2 x Yy, with phosphate huff or to pH 8.0, (20 x 
loo t ul fractions) t and 

5) constant 5 H KCl plus o.s K guanidinium CX in 1 
X Vy, with phosphate buffer to pH 8.0, (10 X 100 
#i fractions) . 

In addition to the elution fractions , a sample is 
removed from the column and used as an inoculum for 
phaga~sensitive Sup* calls (Sec. IS. 4). A sasspie of 4 
20 p% from each fraction is plated on phage— sensitive sup* 
calls. Fractions that yield too many colonies to count 
ara replated at lower dilution. An approximate titrs 
of each fraction is calculated, starting with the last 
fraction and working toward the first fraction that was 
25 titerad, we pool fractions until approximately 10 s 

phage are in the pool, jLa- about X part in xooo of the 
phage applied to the eotamn. This population is 
infected into 3 x X0 3 - 1 phage-sens it ive PB3S4 in 300 ml 
of 13 broth. 'The low multiplicity of infection is 
30 chosen to reduce the possibility of multiple infection, 
After thirty minutes, viable phage .have entered 
recipient cells hut have not yet begun to produce new 
* . -corn genes are expressed at this phase, 
and we can add ampiciliin that will hill uninfected 
3S cells , These cells still carry F-pili and will absorb 
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phage- helping to .p?«§t multiple infections-. 

If multiple infection should pose a problem that 
cannot be solved by growth at low multipl«~of~ 
S infection on F 4 * cells, the following procedure can be 
employed to obviate the problem, Virions obtained from 
the affinity separation are infected into F~ Jh coil 
and cultured to amplify the genetic messages {Sec, 
13.5} , CCC DtfA is obtained either by harvesting Hi DMA 
10 or by in vitrxi extension of primers annealed to ss 

phage DNA, The CCC DMA is used to transform :F~ cells 
at a high ratio of cells to tstfk* Individual virions 
obtained in this way should bear proteins encoded only 
by the XM& within. 

15 

The variegation produces as many as 6.4 jc 10 7 
different amino-aeid sequences. C eff is 900. Thus, 
after two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
20 cycles, the probability rises above 0.10. 

The phagessid population is grown and. 
ohrosaatcgraphed three times and then examined for SBDs 
(Sec. 15.7). In each separation cycle, phage from the 

25 last three fractions that contain viable phage are 
pooled with phage obtained by removing some of the 
support matrix as an inoculum. At each cycle, about 
10 12 phage are loaded onto the column and about 10 - 5 
phage are cultured for the next separation cycle, 

30 After the third separation cycle, 32 colonies are 
picked from the last fraction that contained viable 
phage; phage from these colonies are denoted SBD1, 
SBB2 f . . , , and SB032 - 

35 Each of the SBDs is cultured and tested for 
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retention on a Pep-Tie column supporting HHMb (Sec, 
IS, 8), Hiage b&? (SBDXX) ahows the great est retention 
on the Pep-Tie {HHMb} solran,. ©luting at 36? m %C1 
while wtH13 elutes at 20 mM KC1. SBBXX becomes the 
5 parental amino-acid sequence to the second variegation 
cycle. 



The result of this hypothetical experiment is 
shown in Table 33, S40 changed to D, X42 changed to 
.10 8, AS© changed to B, yai remained: h f and A7X ohanfM to 



The next ronnd of variegation (Sec. 16} is 
illustrated in Table 3S* The residues to be varied are 

15 chosen by: a) choosing some of the residues in the 

principal set that were not varied in the first round 
(via,, residues 42, 44, 51, 54, 5S, 72, or 75 of the 
fusion) , and h) choosing some residues in the secondary 
set* Residues 51, 54, 55, and 72 are varied through 

20 all twenty amino acids and,- unavoidably, stop. Residue 
44 is only varied between ¥ and F. Some residues in 
the secondary set are varied through a restricted 
rang®? primarily to allow different charges {+, 0, ~) 
to appear* Residue 3S is varied through K, &> E, or G, 

25 Residue 41 is varied through X, V, K, or E* Residue 43 
is varied through St, S t G, H, K, D, B, T, or A« 

01ig#29 and oiig#30 are synfhesiasd, annealed, 
extended and cloned into ph£~7 at the Msk i/lEu X sites* 

30 The ligation mixture is used to transform S i of 

competent FE383 ceils so that 10 s transformants are 
obtained. A new {HHMb} is constructed using the same 
support matrix as was used in round 1, A sample of 
10 12 of the harvested LS? are applied to {HHMb} and 

3 5 affinity separated. The last 10 s phage off the column 
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and an inoeulun are pooled and cultured. The cultured 
phagesdOs ara re-chromategraphed for three separation 
cycles. Thirty~two clonal isolates (denoted SBDXX-X,. 
smil-2,.**, 88011-38} ara obtained from the affluent 
5 of the third separation cycle and tested for binding on 
a Pep-Tie { B'HMfe ] column. Of this sen, SBD1X-23 shows 
the greatest retention on the Pep-Tie (HHMr } column, 
eluting at 69% mK KC1, 

10 The results of this hypothetical selection is 

shovn in Table 40. Residue 38 (K15 of BPT1) changed to 

S, 41 becomes V, 43 goes to ® f 44 goes to F, 51 goes to 

F r 54 goes to S, 55 goes to A , and 72 goes to Q- 

IS The sbdll-23 portion of the gsjp-pjbd gene is cloned 

into an expression vector and BPTX(I15, S17 f ?18, M, 
mo, Fai,- E27, F2S, L29, S31, A32, 334 r Wl t Q72) is 
expressed in the periplasm. This protein is isolated 
by standard methods and its .binding to HHMb is tested. 

3:0 % is fonnd to be 4.5 x 10~ 7 H. 

A third round of variation,, using SBB1X-23 as 
FFBD , is illustrated in Table 41? eight amino acids era 
varied. Those in the principal set,, residues 40, 55, 

2S and 57, are varied through all twenty amino acids. 
Residue 3a is varied through P, Q, T, K, k, or E. 
Residue 34 is varied through 1\ P, Q, K, A, or E. 
Residua 44 is varied through T f h, X, C, % or stop. 
Sesidua SO is varied thro-ugh E f K, or Q. Residue 52 is 

30 varied through L, T, Z, *> or V. 

The result of this variation is shown in Table 42. 
Tte selected SBD is denoted SBDX1-23-5 and elutes from 
a pap-Tie {mm} column at 980 vM KC1. The sbdllnilnS 
3S segment is clones into an expression vector and 
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BP?Z{£9 ? Oil, SI 5, A17, VIS, Q.19 f H20 f W2I, Q2? f F28, 

S3X, L32 f 834., W?X f 073 J : is produced. T&L& time 
the % is 7*3 x ID"" 9 M» 

5 This example Is hypothetical. It is anticipated 

that mors vari.agati.cn cycles will he needed to achieve 
dissociation constants of 10~ 8 M> It is also possible 
that mors than three separation cycles will be needed 
in some variegation cycles. Real DKA chemistry and DH& 
10 synthesisers may havo larger errors than our 

hypothetical 5%, If S err > 0*05, then wa may not bo 
able to vary six residues at. once, variation of 5 
residues at once is certainly possible. 
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-Table 2: Preferred Outer-Surface Proteins 



Preferred 

surface 



Protein 



Re a eon_,f or preference ... 



coat protean 
(gpVirX) 



a} exposed amino teradmiss ,. 

b) predictable post- 
translations! 
processing,, 

c) numerous copies in 

_ vi.r.ioi'Ls 



G protein a) known to be on virion 
exterior., 
b) s?aail enough that 
the G~ipfod gene, can 
rep; • 



LaraB a) fusion data available, 
bj nor,- - y -< • - 



a) no post^tr&nslational. 
processing,. 

b) <&ir<x - t - 

that causes protein to 
localize In spore coat, 
oj non-essential « 



worn/mm 



PCF/US89 03731 



Table 7; atomic radii 




Sable a 

Fraction of DHA molecules having 



reagents that have fraction 
M of parental nt» 



10 Jiia ,.£211S.....^I§lg ^5?2 ,.70433 ..§3096 

fO .9000 .6000 .1000 .0100 ,0010 ,000001 

fX .09499 .35061 .2393 .04977 .00777 .0000175 

£3 .00485 .1188 .2708 .1197 .0202 .000149 

£3 .00016 .0259 .2061 .1854 .0705 .000812 

13 £4 .000004 .00409 .1110 .2077 .1232 .00320? 

£S 0. 2xl0~ 7 .00096 .0336 .1182 .080165 

f!6 0. 0. 0. 5xX0~ 7 .00006 .027281 

20 

£23 0. 0. 0. 0. 0, .0000089 

*«pst« is the value of n having the highest 
probability. 



25 
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.3 07 



pom smmnt 



Table 9; best vgCodon 

Program "Find Optimum vgCodon. » 

DO < tl » 0,21 to 0.31 in steps of 0,01 ) 
. DO ( cl « 0.13 to 0.23 ill steps of 0,01 ) 
. . DO ( al * 0.23 to 0>33 i» stops of 0«O1 ) 
Comment calculate gl from other concentrations 
10 , . . gl ~ 1,0 - tl - cl ~ al 

, . . IF( gl ,ge. 0.15 } 

» , .. . DO { m * 0.37 to 0,50 In steps of 0.01 ) 
...... DO { c2 « 0.12 to 0.20 in steps of 0.01 ) 

Cessment Force D+S - R 4- K 
15 ...... g2 - £gl*®2 -.5*&l*a2)/(cl+0.5*al) 

Comment Calc t2 from other concentrations, 
...... t2 U ~ a2 - c2 ~ g2 

..... . IF(g2.gt. 0.1. and. t2,gt.0.1) 

20 ....... COH:B\H>>ABUHDiy?CES-TO~PBEY10US~'OHES 

...... . .end^XFJhlock 

...... .end„TO„loop I c2 

. . ■» . . . ©n«3J3Q m loop 1 a 2 

. , ..©nd_lFj5locJc I if gl big enotigfc 
25 . . . . end^DO^loop 1 al 

. > .endJXKioop i cl 

. , enSJJOJLoop ! tl 

the best di i >n and the abundances. 



worn/mm pi i : 89 m: i 



Table 10; iUoundances obtained 
from optimum vgcodkm 




20 



...acicL. 

% 


4*80% 


C 


2,86% 


I) 


*..d$% 


s 


6.00% 




2.86% 


G 


6.60% 


H 


3.60% 


I; 


2.88% 


K 


5.20% 


L 


6.82% 


tt 


2.86% 


■» 


SM 


F 


2.88% 




3.60% 


E 


6.82% 


S 




<P 


4.16% 


V 


§.60% 


w 


£.86% Ifaa 




5.20% 










ratio 




- 0.4074 





25 



4 




(ratio.) 3 




X 


a. 4 54 


,4074 


.9480 


2 


6.025 


,1660 


.8987 


3 


14.788 


,0676 


.8520 


30 4 


36.298 


.0275 


.8077 


S 


89.095 


,01X2 


.7657 


6 


318.7 


4.57 X 10~3 


. 7258 


7 

ifas 


538.8 
i « least - favor 


1.86 X 10~ 3 
ed amino-acid 


.6881 


35 mtm 


i « most - f avora 


A atainc-aeid 





wo mmnm 
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uiate worst oo 



Program "Find worst vgCodon within Serr of given 
distribution » * s 

Serr is % error 'lessrel, 
REM) Serr 

Tli,CXi,Ali,Sli f T2i,C2i,&2i,G2i, T3i,G3.i 
are the intended nt-distribution, 
SffiC Tli ( Cli, All, &Xi 
READ T2i, C2i, A2i, S2i 
READ T3i, G3i 
Fawn I, -Serr 
Fup = 1,-fSerr 

DO { tX m TXl*Fdwn to 1!Xi*Fttp in 7 steps) 
, DO ( cl » Cii*F4»h to CliXPup in ? steps): 
. . DO ( ax ** AXi*Fdwn to Aii*Fup in 7 steps) 
gl m-Xa ~ tl ~ cl ~ al 
1F( (gX~<jXi)/0Xi «Xt. -serr) 
gl too far below Gil, push it back 
. gl - <31i*Fdwn 

* factor * (X.~gX)/{tX * cl + al) 
. tl «■ tl*factor 

* cl - cX*faetor 
, al ** al * factor 
, ,€snd_IF_bIock 
XF{ {gl-GXiJ/Oli ,gt, Serr) 
gl too far above GXi, push it back 
, gl » Gli*Fup 

, factor - £X«~gl)/(tl + cl + al) 
, tX - tl* factor 
, cX « cX* factor 
. aX « aX* factor 
*.and_XFJbloek 
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Table XI, continued . 



DO { a2 ■** m±*mvn to &2i*Fup in ? steps) 
Table 11, continued. 



20 



DO { c2 « 62i*Fdta5 to C2i*Fup In ? steps) 
» DO {g2«G2i*F*m to G2i*Fup in ? steps) 
Calc t2 from other concentrations . 
, . t2 ~ l» ~ aa ~ ca - ga 
.» . XF( {ta~T2i)/T2i .it. -scrr} 
ta too far below T2i, push it back 

* * t2 . » Ta i*Fdwn 

* . factor «*•• <X.~t2)/<a2 * c2 + g2) 
, . a2 » aa*«actor 
. . c2 <* ca*f actor 
« . g2 <* ga*factor 
, . , end JCF^ block. 

* IF( (t2~T2i}/T2i .tjt, serr) 
ta too far above T2i, push it back 

* ♦ ta - TZi*mp 

. « factor <X»~t2)/{a2 4- c2 * g2) 
Table 11, continued. 



, a2 ~ a2*factor 
. C2 «* c2*f actor 
, ga ~ g2* factor 
. »end_IF_blcck. 
IF(g2.gt. 0*0 .a 
. t3 ~ 0.5*(1*~S 
. ga » l. - t3 

, C&LCUXATE-A 



t2.gt.0.0) 



ta - 0,5 
ga ~ i. - ta 



worn/mm mn m/vmi 

six 

Table IX, continued. 
, CQWBMm^AWMBmCBB -SO-PREVIOUS -ONES 

5 t3 ~ 0,5*(1*+Serr) 

« . .. . , . .» » g3 «* i. ~ t3 

, , . . . ■» . CM^COIATE-ABOHDMCES 
Table XI, continued. 

10 ....... €OMmSE~ABtlNDANCES-TO»PREVlGOS'~ONSS 

...... ..end_XF_Mock 

..... . .endJDOJtoop I g2 

. . » . . .eiu^DOJLoop i c2 

. 4 , .♦anajDOJLoop 5 &2 
IS . « , ««n.a„00 m loop I al 

. , . end J30JLoop 1 cl 

WRITE the WORST distrifc&ticm and the abundances, 
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Tabic 12" Abundances obtained 
using ©pfcimtm vgCo&on assming 
3% errors 



Aid. no 






li^m^. 


A 


4.59% 


D 


5,45% 




2»49% Xxaa 




3 x S9% 




5,73% 




3 « 00% 




3 x 02% 




™2*os . « 




4x37% 


W 


3x05% 




^111 



Amino 



3£l£ ai2imdsncs 

c a.7€% 

S 6.02% 

<3 6,63% 

I 2.71% 

X, 6.71% 

K 5.19% 

a 3 . 97% 

S 7 . 01% 

V ' 5.00% 

ST 4.77% 



ratio « ; ■Atoun(F)/Abun{R) 0.3248 



2 »*4«1 -1055 18973 

3 29.193 ,03425 ,8500 

4 89.888 ,01112 .8052 

5 276.78 3,61 X IS™ 3 ,7627 

6 852.22 1,17 X 1ST 3 ,7225 

7 2624.1 3x81 X 1Q~ 4 .6844 
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Table 13 s BFI'I Hoasologues 
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Table 13 , continued. 
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E # *» resistue miiamer 
1 BPT:I 

a 3Sngin««re<5 BPTI ttxm MASKS? 

3 Engineered BPTX From m&XB7 

4 Bovine Colostrum (BtlFTSS) 

5 Bovine Serm (DUFT85) 

6 semisynthetic BFTI, TSCK87 

7 semisynthetic BFTI, tschs? 

8 Semisynthetic bfti, TSCBS7 

9 Semisynthetic BPTX f - TSCK87 

10 Semisynthetic BP ! EX ? TSCHS 7 

11 Engineered BFTI, &0ER8? 

D§ tatba) venor I 

{ roFTS S) 

13 Dendro»spi s poy/i epi s„„p,o lyl ep is (Black Hamha) venom F. 
fnoFTas) 

14 HeBaoMtus,,heBacna^ (Ringhale Cobra) HHF II 
fDUFTSSs 

13 Mai a ni.v«s (Cape cobra; KHV II (DUFT85) 

16 Vip„r - . FW II (XAKA74) 

17 Fed sea turtle egg white (BFFTS5 } 

IB Snail mucus (Mk..P^^ ^ (FASH78) 

15 Dendrgaspis,,,aiMastice,ps (Eastern green camta) 
C13 SI C3 toxin CDUFT8S) 
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Table 13, continued* 
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~ ' 2 2 f " ^ -c — ~ :< ; . ~\~ \ -ry ' poXyier;es {5lac>r. 

Massifea} E toxin (DUETS 5} 
23 Vipera ammodvtae TX toxin 

v tv. < i '8 i : 

25 Bnngarus fasciatus VX1X B toxin (CIJFT8 S } 
a anemone) S II 

(DUETS 5) 

27 Hor i il-14 "inactive" domain 
(DUETS 5) 

28 Hc„ HX-X4 "active^ domain 
(DUFTS5) 

29 beta fomiaaro toxin s > D0FT8S) 

30 beta bonoaro toxin B2 (DUFT85) 

si Bov±m mxmti m xx cfxgrss} 

^ C ~ t < < 3 7) 

(silkworm) SCX-IXI (S AS AS 4) 

fotas ; 

a) both beta bung arotoacins have residue is deleted, 

b) B, nori bas an extra residua between CS and C14? we 
nave assigned F and G to residue S , 

c) all natural pt Ln iav@ C at 5, 14, 30, 38, so, & 55,. 

d) all homologies have F33 and G37. 

a) extra c*s in bungarotoxins form inter chairs cystine 
bridges 
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Table 14? Tally of Xonixafole Groups, 
BPTI ho»olocpae». 
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10. 













* is sxm of K + R 4- HH «~ p ~- E - €02, approximate charge on 
molecule at pH 7.0 

# is sum of K + S t KH * O + I + CQ2, 1^ number of ionised 
groups at pH 7,0. 
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Table 15 1 Amino acids observed, at each Residua 
bftx homologuos 
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Table IS; continued. 
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Table 16; Exposure in BPTI 
Coordinates taken t&m 

BrooMiaven Protein Data Bank entry 6PTX. 

HEADER PROTEINASE IMH1BIT0.S (TRYPSIN) X3-MAY-87 6PTX 
COMPHD BOVINE i V t ^ TRIPSIN INHIBITOR 
COMBED 2 { /BPTXS > CRYSTAL FORM TITS) 
AUTHOR A * BY..ODARER 

Solvent radius «• '$.40 
Atomic radii given in Table 7 

Areas in An «str< qrared. 



Total Covered covered 
Residua area by M/'C fraction at all fraction 
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'Table 16, continued. 
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"Total area" 



"Not covered 
by K/C" 



"Not covered 
at all* 



is the. area measured toy a rolling- sphere 
of radius 1.4 &, wner© only the atoms 
within the .residue arc 
takes account of conformation, 

is the area measured toy a rolling sphere 
of radius 1,4 A where all main-chain at oats 
are considered, fraction is the exposed, 
area divided toy the total area, Surface 

I d fc\ m adn-c i t s ore 

definitely covered than is surface covered 
toy aide group atoms, 

is the area measured toy a rolling sphere 
of radius 1,4 k where all atoms of the 
protein are considered. 
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Table 17s HassiSs nsed in Detailed Example 
JS*§jge Contents 

<dgx mmpiB with m& n/Mt xi/ass x/.m 

ll/Sau I adaptor 
pLG2 LGl »ith a« p£ and CoIBX of pBE322 cloned 

into Ml II/&SS I sites 
pLG3 pLG-2 with, &cc 1 site removed 

P&S4 pLG3 with first part at gsp~,pfod gene 

cloned into B.sr ll/Saa I sites* 

Ml. IX/Asu II sites created. 
pLGS pl^34 with second part of osp-obd gene 

cloned into MX H/M II sites, $%?K I 

site created 

SUm P&S5 with third part of asg^ gene 

cloned into to WMiS X sites f I 
site created 

pi®? pLG6 with last part of oso-obd gone 

cloned into Bba X/M.1 IX sites 

pLG8 pLG7 with disabled osp~phd gene, same 

length 

pLG9 pt&7 mntated to display BPTI (V15 BPTr ) 

pLGXO pLGS + t§t E gene - m& R gene 

pliSll pLG9 + tet R gene - gene 



WO 90/0289$ 
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tafel* as j Annotated Sequence of jpM f&n® 



5 ' ~ C j GGA ! CCG j TAT j CCA j GGC ] TTT | AC.A j CTT j TAT j 



I C3CT | TCC | GGC j TCG j TAT | AAT | GTG | TGG | 



| AAT | TGT i GAG j CGC j ATA j ACA j ATT | 
i™™laS™9Bi£Ste ~i 



| CCT | AGG j AGS | GTC j ACT j 



J M j "k | & { 85 | 1 { V j 1 j It j « j It j 
j 1 j 2 ) 3 j 4 1 S | 6 i 7 | 8 j S j 10| 
| ATG | AAG 1 AAA j TCT j Cm\ GTS j QTT lM,Q\QCS | AGG j 



I a | t ] 1 j v | p | ® I I 

| IX | 12 1 13 1 14 j 15 j IS | 17 j 18] 19 | 20 j 
f GWf SfiT j GTC | GCG | ACC 1 CTG | GTA j CCG [ ATG [ CTO j 

i^teLXl jJgaa J Si 



| <a I £ | a I r | | 4 | f | c J 1 1 # ! 
| 2X1 aa | .2:3,j 24 ) 25 j 28 j 27 j 28 1 29 j 30 j 
| TCT | TTT j GCT j CGT | CCG I CAT 1 TTC ! TGT j CTC I GAG | X? S 



WO .90/02809 
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Table 25, continued- 

U iixxi jjma I 

J.....¥M„.I...j. 



! P ! P ! F I t j g j p | c j k | a | r j 

| 31} 32 1 33} 34 1 W\ 36} 3?| 38} 39j 40 1 
| CCS j CCA I TM ! j ACT \ GC-.G \ CCC j TQC | AAA | ©CG | CSC } 

lMm...jj.l 



Asa X j 
Bra XI 



I i I i ! ! y I £ | j f x* | a j k j 
| 41 | 42 | 43} 44} 45 | 46f 47} 48} 49 j 

j ATC } ATC | CST { TAT f TTC } TAC j AAC j GCT j AAA j 23 5 



wo m/t&m 
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Table 25, continual* 

| a | g I I ! c { f j t j f | v f y j g | 9 | 
} SO | 51 j 52} S3 1 54} 55 j 56 j 57] 58} 59} 60 1 
| GCA | GGC | CTG j TGC { CAG | ACC } TTT j G?A j TAC } GGT j GGT { 
j 3tu Ij USfiJU 
j Xea 1 I 



} c | r } a | k \ r | » } n j f I & j 
| 6.1 1 62 1 63 1 64} 65 1 66} 6? j 68} 69 } 

|TSC|CSTjGCT | AAG j CGT j AAC j AAC| TTT } AAA j 295 

XJBsslS~~1 



j s ] a { e U I c } » } r | t } c j g | 
j 70 } 71 | 72 | 73} 74 f 75 j 76} 77} 781 79 j 
{ TCC | SCC } GAA { GAT | TGC f XSG j CGT j ACC } TGC \ GGT j 325 
iXmaXIXi j Sph Jj 



| g | a | a | e | g j d f a. | 
| 80} 81 ! 82} 83} 84} 8S| 86} 
} GGC } GCC } GCT j GAA } GGT | GAT f GAT { 



p j a | k. { a } a | 
87 | 88} 89 J 90} 91 j 
CCG|GeC|AAA|GCGlGCCj 361 
j Sfi 1 1 



PC l/VSt >3731 



226 

j r j n | : * ! . i | g 1 a I s [ t i t I 

Sable 25 r corrtimied. 
| 92 1 93 1 94 j 95 f 96 j S?j. 98? 9S|lOO| 
| TTT j AAC I TCT i CTG j CAA j GOT j TGT I SCT j ACC j 



I « i -y j lis] y 1 > i * I 

\ 101 1 102 \ 103 j 104 ) 10S 1 106 1 107 j 

| CAA | TAT | ATC j GGT j TAG j GCG TGG j 409 



| a j * j v | v ] v | 
j 108 i 103 j 110 j 111] 1.12 j 

j SCC. I ATG | GTS | GTG | GTT j 424 

1 Ptx 1 I 

Umua. 



wo nnmrn 
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T&Me '2$, continued. 



1 1X3 | 114 | 1X5 j 116 1 1X7 j 118 | 1X9 \120\ 
| ATC | GTT | GGT j GOT \ ACC | ATC j GGT | ATC \ 



\ k j 1 | f j & j 3s j £ I t j 8! j i * j 
1 121 1 122 j 123 1 124 1 125 1 126 i 127 | 128 1 129 | 130 j 
i AAA | CFG | TTT } AAS- 1 AM. | TTT | ACT | TCG | AM.j SCS ] 



| 131 | 132 1133 | 124 | 

| TCT | TAA ] TAG j TGA \ GGT \ TAG | CAG i TCT | 



| AAS j CCC \ GCC j TAA j TGA j GCG | S<2C | TTT \ TTT f TTT j 532 
\^or....._ — i 



| CCT j GAG j 6 -3' 
j sau I .. [ 



Mot© th& following enssyme equivalences f: 



.Ma 111 SIS I 

Acc III - Bspil 11 

Or a xx - Kcoaiili 1 

ASM 11 - SstB 1 

sau X - Bsn36 X 



wo mtmm TcrwsmmrM 
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Talkie 27: DNA„synthl 

s ' .te j.^.i§a?c ! ssa j ccg | tat | ccAiSi^isj y o pa; cmzmi 



o!ig#4 - 3'- gt tsa 



/ 3' « olig#3 
egg ega gga age trfct cgc 



J TCT | T&A | TAG | TGA ; GST j TAG \ C&G \ TCT | 
aga att ate act cca atg gtc aga 



| &AG | CCC 5 : X'TT \ TTT | 

ttc ggg egg att act cgc ceg aaa aas aaa 



i CCT | GAG | 1 GGT j G?>G \ €G 
gga etc cgt cca etc gc - S * 



WO 90/82809' 
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«Top" strand 99 

"Bottom" strand 100 

Overlap 33 

Net length 158 



Table 27, continued. 



(14 c/g and 3 a/t) 



wo 9 mwm 
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Table 28 i m&j&&%2 



S ' » | gca | oca | sag j 
j spacer | 



| CCT j AGG : ACG j CTC j ACT \ 
I Ayr IX j 

!.. 



j » I k j it | » f i j v -| i | k j x \ 4 \ 

\ ■ X I a j 3 | 4 f 5 j <6 j ? j S I 9 I 10 j 
I ATG | AAG j MA j TCT | CTG | GTT I CTT | A&G I GCT j AGC { 



1 ■ v j a | v | a \ t I 1 | v j p | m | 1 j 
i 11| 12 1 13 1 141 15 1 X4\ 17 1 18 j 19 j 20| 
| GTT f GCT j GTC \ GCG j ACC j CTG | GTA } CCS { ATG j CTG j 
1 JUKfflLli 



j I 1 I I .a |,r j p j & | f | c | X 1 e 
| 21 1 22 1 23 j 24 1 25 j 26 f 27 j 281 29 j 3C 
| TCT | TTT j GCT j CGT i CCS | GAT j TTC | TGT f CTC ) SAG 

ites&i I 



I P ! P I ? I t j g I p j c 1 k I a i r ! 
1 31 j 32 | 33 | 34 | 35 j 36 j 37 j 33 j 3Sj 40 j 
I I CCA j TAT | ACT j GGG | CCC \ TGC | AAA j GCG | CGC f 



WO 
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i i j i I r ! 

jatcjatcjcgtj 



Table 28, continued. 



....gra.,..I,I, , j 



| i;2 7 j IS] 12 ? | 

jACTjTOfMajgcgjgctjgcgj - 3' 



wo mmmm 
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Tafele 30 i BNA.saq3 



f. * j r j 



1 i | i 1 r ] y j t | y j a f a f 1c | 
1 41 | 42 | 43 | 441 4S| 46} 4? j 48! 4$) 
\ Kml ATC I CGT j '3?M? j TTC | TAC | AAC-j 



I a | g | % I c I g j t j £ j v } y j g j g | 
j SO | 51] 52 | 53 j 54 j 53 j 56 j 5? j 58} S9j 60 j 
j GCA j mc } CTG | TGC } GAG | AGC f TTT } GTA j 5»AC } GGT j GGT | 



I 61} 62 j 631 64 | 65} 66} 67] 66 j 69 f 
j TGC i CGT | CCT 1 AAG \ CGT}&AC}AAC | TTT| AM} 
J HS P 1 j 



| -» \ ft } * | d } a ] * } r j fc j c | g f 
| 70] 71] 72j 73} 74} 75 j 76 j 77} 78} 79 i 
| TCG j GCC j G&A I GAT } TGCiATGj CGT j ACC j TGC j GG1 j 



I 9 ! a. 1 

| 80] 61] 



WO 90/02809 
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Table 30 f continued. 

LMm,J Lib 



! t j s 1 * | 
j 12? | 12$ | 129 | 

I ttt ! act? | Tea | A&a | »| Wj.ecg j - 3' 



23 4 



Table. 32; BKAjs»q4 



j g j a j a j e j f M U | 
5> | 80] Blj 821 *3.j 84J as l »«l 

j CCt I cgc | CCt 1 <30C | SCC [ GCT 1 GAA j SGT | GAT j GAT j 

■LmcaL 1 Bbe i j. 



I p i a | k j a i a f 
j 87 | 88 | 89| 90| §1[ 



1 t 1 n h 1 1 U U ! s 1 a i t j 
j 93 1 93 1 94 i BB\ , m\ 97 j 98 j 99|100| 
} TTT i AAC j TCT j CTG j CAA | GCT \ TCT j GCT j AGO | 
iHlnd 3 | 



!e!i!iU!l|a!v! 
j 101 1 102 | 103 1 104 | 105 j 106 j 10? } 
\ GAA | TAT f ATC f QGT \ TAG } GCG j TGG j 



j a | JS | Y | V | V | 
| 108 | 109 | 110 | 111 | 112 | 
I QCC | ATG j GTG j GTG j GTT \ 

i BstX. I _l 

1 »QO *1 
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Table 3.2, continued. 
1 i i v | g | a ! t | i j <r f i I 
1 113 1 114 j IIS 1 116 ( 11? 1 118 j 113 1 120 ! 
! ATC | GIT j GOT j GOT | ACC | ATC ) GGT j ATC | 



\ * | 1 j X ! k I k f f j * j « j ■ 3k j 
1 121! 122 1 123 ] 124 j 125 J 126 j 12? } 128 j 129 j 

| AAA j CTG | TTT j AAG j AAA j TTT j ACT j TCG | AAa | fif eg | tog I age | - 3 < 



»CT7!.ISS 



Table 34 J So.ma interaction sets in BPTI 



3*s. Diff . . 

i aas j^^^. — jmk * a 3 4 S 



-5 


2 


D -32 










-4 


2 


E -32 










-3 


5 


T F F % ^29 










~2 


10 


23 S3 Q2 T2 H G X X E ~XS 










«*1 


10 


04 T2 F2 Q2 E G H K E *18 










1 


IP 


R21 &2 K2 H2 F L X 1' £ D 


E 






S 


2 




P20 R4 A2 H2 M E V P & 






s 


5 




10 


D1S' T3 R2 P2 S Y G A L 






4 


ss 


4 


7 


F19 D4 L3 Y2 12 A2 S 








§ 


5 




C33 






X 


X 




10 

§ 


Xll E5 H4 3S3 Q2 12 Y2 D2 $ E 
X1S 111 !K2 S Q 






4 

S 4 






7 


P26 H2 M I L G 1 






3 4 








P17 m ¥3 R2 Q L K Y F 






3 4 




IS 


10 


Yll 17 D4 A2 M2 R2 V2 SID 




& 


S 4 




11 


W 


T17 PS A3 R2 X S Q Y V K 




1 S 


3 4 




12 


2 


G32 K 




X 


X X 
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Table 34, continued. 



X<i 


3 


1?22 B.S L3 F X 


F 


X S 4 S 


14 


3 


C3X \T & 


0 


1 S3 S 3 


15 




k &a v) m T<? «*? 1? & A I H F 


K 


1 s 3 4 S 


16 


' C 




A 


1 s s s 5 


17 


12 


1?*S *> r-, v-5 «"> S? F2 L i f 5 ? 


E 


12 3 S 


IS 


6 


Y'Y-l Mi T.>> V? it* 

n4 ff4 v^s A 


I 


1 s s 5 


19 




T1 i r» "DA K"? T< n 
JuXi. ifiv Kt3 >»sS jx* i>> W 


X 


X 2 3 £3 


20 


ts 


A/ U4 v 


R 


s s s 5 


21 


4 




t 


2 s s s 


22 






F 


S 3 4 


23 


2 


f 32 F 




S S3 


24 


4 


H26 K3 D3 S 


H 


S 3 


25 






A 


s s 


-\ s- 


9 


»1 ; a<\ 'X>"> «t'J S 1 '? A ?-? V 


K 


s 3 4 


2:7 




»1 o ea ft-* t"& <P5 


A 


2 3 4 


-'-^ 


? 








29 




T O Ci7 K~t W li.2 M S T N 


I* 


2 3 


in 






c 


X X X 




7 


Q12 BIX L4 &2 ¥2 if H 




2 3 4 






T13 PS &4 P3 B2 L2 G V S R A 


T 


2 3 s 


33 


1 


F33 


F 


X X X X 


34 


11 


Vll 18 2*3 D3 N2 02 F E S> R K 




12 3s 


35 


2 


Y31 W2 


¥ 


S S S S 


36 


3 


G27 S5 R 


e 


1 


3? 


1 


G33 


<? 


X X 


3S 


3 


C3X T K 


e 


1 S 5 


39 


7 


R13 69 K4 Q3 132 F M 


R 


1 4 S 
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Table 34, continued. 



40 


a 


G22 


All 


A s 


s 




41 


3 


H20 


&xi m 


K 


4 




42 


9 


All 


E9 S4 (33 H2 15 Q K W 


R 


s 




43 . 


2 


83 X 


02 


Iff 






44 


3 


R2X 


mi k 


H 






45 


? 


P32 




F 






46 


8 


K24 


12 S2 D H V ¥ K 


X 






47 


a 


TX9 


SX4 


S 






48 


9 


All 


19 14 T2 m L2 E K D 


A 






49 


7 


E1B 


D6 A2 Q2 K2 T H 


E 






SO 


6 


lis 


D12 L2 M Q K 


D 






SX 


1 


C33 




C 






52 


7 


RX3 


HID L3 B3 m K V 


K 






§3 


8 


R2X 


Q3 E2 H2 C2 <S K D 


R 






54 


7 


T23 


A3 72 12 I Y K 


T 




5 


55 


1 


C33 




C 






66 




G15 


V8 13 12 12 A L S 


G 






5? 


8 


sis 


V4 A3 P2 -2 R L N 


G 






58 


8 


All 


-10 P3 3D S2 Y2 R F 


A 






S9 


9 


-24 


G2 0 & A t S P 1 








60 


6 


-28 


Q E 1 G D 








61 


3 


-34 


T P 








62 


a 


-3 a 


D 








S3 


2 




K 








64 


2 


-32 


S 









s indicates secondary sat. 

x indicates in or close to surface tout buried and/or Highly 
conserved - 
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Table 35i 
Mstances from C bet;a to 
Tip of Side Group 
in Angstroms 

Amino Acid type 
A 

C (reduced) 
D 



4,0 
2,5 
5.X 
2.6 
3.3 
2.4 
2.4 
3.5 
6.0 
1.5 
1,5 
1.3 
S.3 
5.7 



0.0 
1.8 
2.4 

4,3 



Botes: These distances were calculated for standard model parts 
with all side groups fully extended. 
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Table 38* Distances,. BPTI residue set #2 
Distances in Angstroms between C fo@ta s* 
Hypothetical C fceta was added to each Glycine. 

E17 113 tZX K21 028 W Q31 T32 V34 MB 



X19 


T* 


7 




























¥21 


15. 


1 


8,4 


























S3 7 


22* 


6 


17.1 


12.2 
























G28 


26. 


6 


20.4 


13,8 


5, 


3 




















L23 


22. 


5 


15.3 


9,6 


S, 


1 


5,2 


















031 


18.1 


10.4 


6,8 


6, 


s 


10,6 


6> 


8 














T32 


11.7 


3.2 


0,1 




0 


15,5 


10,9 


5,4 












V34 


5. 


6 


6,3 


11.0 


17,6 


21,7 


IS, 




11,4 


8,2 








A48 


IS. 5 


11.0 


5.4 


12, 


6 


13,3 


8,4 


8,8 


■8, 


,3 


15,7 




E49 


22. 


0 


14 , 7 


8,3 


16,9 


16,1 


12,2 


13 , 9 


13.3 


19.3 


5, 5 




a?- 




IS. 3 


8.6 


12. 


2 


10.3 


7, 


6 


11-3 


13. 


a 


20. 


.0 


6.2 


PS 


14, 


0 


11.3 


9.0 


12.2 


15.4 


13.3 


7.9 


9, 


>2 


8, 


.7 


13.3 


Til 


9, 




11.2 


13 . 5 


xa. 


8 


22.3 


13.8 


13.5 


12, 


.1 


5.. 


,7 


18.5 


K1S 


7* 


9 


14.6 


20.1 


27. 


4 


31.3 


27. 


9 


21,4 


18, 


,1 


10, 


.3 


24*6 


A16 


$. 


5 


1.0.1 


IS » 9 


as, 


2 


28,5 


24, 


# 


18,6 


14 


.5 


8> 


.6 


19.8 


IIS 


6. 


1 


6.0 


11,2 


21. 


3 


24,4 


20. 


2 


14,7 


10 


,4 


7, 


,9 


15*0 


E20 


10, 


6 


5*9 


5.4 


16. 


0 


18,5 


14. 


« 


9.8 




,9 


7. 


,8 


10 , 2 


F22 


15, 




10,9 


5,6 


10. 


• 5 


12 . 8 


10, 


3 


6,2 


8 


.1 


10. 


.8 


10 . 3 


H24 


19 . 


,9 


14.7 


9 . 4 


4-. 


1 


7 . 3 


6, 


1 


4,3 


10 


.0 


14. 


,7 


11,4 


K26 


24. 


>4 


20.1 


15.2 


5, 


.4 


7,7 


9. 


8 


10,1 


15 


.3 


19. 


,0 


17.0 


C3G 


IS. 


.9 


12.1 


4.6 


S< 


v8 


9.5 


5. 


.3 


5,9 


8 


.2 


14. 


,9 


4.9 


133 


10. 


,8 


7.4 


7.7 


13, 


.6 


16.4 


13. 


,0 


6.6 


5 


.6 


5. 


.5 


12 . 2 


Y3S 


8. 


,4 


7.4 


9.4 


18, 


.4 


21.4 


17. 




12.2 


9 


3 


5, 


.8 


14.4 


S47 


17. 


,6 


10.6 


6,6 


17. 


.3 


17.9 


13, 


A 


12 , 6 


10 


.4 


15, 


.9 


5,3 


BSO 


20. 


,8 


13 . 6 


7,2 


17. 


<2 


16.8 


13, 


»S 


13.6 


12 


,9 


17 




7.6 


CS1 


18. 


,3 


12.2 


4,0 


12. 


<X 


12.2 


8. 


-8 


S.8 


9 


.7 


IS 


.3 


5.4 


ESS 


25. 


,4 


18.6 


11,0 


17. 


,2 


15. Q 


13. 


,0 


15.7 


16 


,7 


22 


,3 


3.7 


E39 


15. 


,4 


16.9 


17 ,1 


24. 


,9 


27.2 


24, 


.9 


20.1 


IS 


.7 


13. 


,8 


22.3 
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fable 36, coatinuad. 



Distances in Ar^stroias between C toeta s. 
Hypothetical C beta was adaecJ to each Glycine, 
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P9 
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K1S 


AX6 


1 1 8 


R2 0 


F2 2 


N24 

























P9 




IS s 


















Til 


22 > 1 






















2? < 5 


2 8 . 7 


lb. 4 
14 , 9 


9.5 
9,8 


6.2 












lis 




13.5 


12.2 




10 . 4 


4 * 9 










R2 0 








9 . 4 


14 « § 


10, 6 


6,2 












11* 4 


4 . 1 


10 . 6 


X9 , 1 


18,3 


12.7 


6.9 












8 . 4 


15 . 3 


24 » 1 


21,9 


18, 2 


12 , 7 


6,6 
















2 6 , 6 


23,3 


18 , 1 


11.6 


5 , 9 


C3 0 












2 0 . 2 


15 , 7 


3 . 8 


6.8 


6.9 


?33 


16.3 


IS. 4 


4.2 


7.1 


15.0 


13.8 


9 , 6 


5,1 


5.6 


9,3 


Y3S 


17.2 


17 . 8 


7.8 


5.8 


11,0 


7,6 


4,9 


4,3 


8.8 


14,8 


S47 


4.? 


9.1 


IS. 3 


18 » 5 


23,1 


17,6 


12,8 


9.1 


12.0 


IS. 3 


DSO 


S,S 


7,7 


14,7 


IS. 6 


24.2 


19.2 


14 , 7 


9.9 


11.0 


14.7 


CSI 


7,1 


5.4 


11.0 


16 , 4 


23 . 5 


19.2 


14.6 


8.7 


6,9 


9.6 


R53 


6.3 
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A method, of obtaining a protein that binds a 
predetermined target that comprises: 

a) .preparing a- variegated population of rsplieabl© 
genetic packages, each package including a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein other 
than a single chain antibody comprising (i) a 
structural signal directing the display of the 
protein on the outer surface of the package and 
(ii) a potential binding domain for binding said 
target, where a plurality of different potential 
binding domains are displayed by sale population, 



b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

20 

c) contacting the packages with target material so 
that the potential binding domains of the proteins 
and the target material may interact, and 
separating packages bearing a binding domain that 

25 hinds target material from packages that do not so 

bind, and 



d) recovering and replicating at least one package 
bearing a successful binding domain, 

preferably further comprising (e) determining the 
amino acid sequence of a successful binding 
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and more preferably, further comprising <£}. 
preparing a new variegated population of 
replicable genetic packages according to step (a) , 
the parental potential binding domain for the 
5 potential binding domains of said new packages 

being a successful bidding domain whose sequence 
was determined in step («} , and repeating steps 
(b)-(e) with said new population. 

10 2. Tb* method of claim 1 wherein the population of 
replicable genetic packages of step (a) is 
obtained by: 

.1} preparing a variegated population of DMA 
15 inserts of each of which comprises a first 

sequence which codes on expression for a potential 
binding domain a second sequence encoding 

signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
20 replicable generic package, and 

ii) incorporating the resulting population of UBK 
constructs into the chosen repiicabie genetic 
packages to produce a population of replicable 
25 genetic packages, 

wherein preferably (1) said population is 
characterized by the display of at least 10 s but 
not more than 10 s different potential binding 
30 domains and/or {2} from i in 10 4 to 1 in 10 9 of 

the packages of said population display the same 
potential binding domain. 

i. The method of claim l wherein, in step (a) , the 
35 potential binding domains encoded by the nucleic 
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acid constructs are each related in sequence to a 
parental potential binding domain 'by a limited 
number of amino acid substitutions in the amino 
acid sequence of said parental potential binding 
domain, and, preferably the level of variegation 
of the population is chosen each that the packages 
displaying potential binding domains obtained by 
single amino acid substitutions in the amino acid 
sequence of the parental potential binding domain 
are present in detectable amounts, and preferably 
the initially chosen parental potential binding 
protein has at least one stable binding domain and 
said domain has a melting point of at least 60°0 
and is stable over a pE range of at least 3, ■ 0-8.0. 

3?he method of claim 1 wherein the display able 
potential binding protein is a chimeric protein, 
and preferably, wherein said signal is provided 
by a segment of said chimeric protein which is 
essentially identical in amino acid sequence with 
at least a functional portion of a natural outer 
surface protein encoded by said genetic package or 
a cell naturally infected by said genetic package,, 
said portion directing the transport of said 
chimeric protein to the outer surface of the 
genetic package. 

The method of claims 3 wherein the parental 
potential binding domain is initially chosen to be 
one which is over m% homologous with a domain of 
a known protein, the latter domain having a 
melting point of at least about 60°C. 
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%tm method of claim § vh#Siein the initially chosen 
parental binding protein does not prsf erentially 
bind the predetermined target. 

The method of claim 3, said target material 
comprising one or more discrete molecules, said 
parental potential binding domain being 
characterised as a sequence of amino acids, 
further comprising identifying an interaction set 
of amino acids which, are on the surface of the 
parental potential binding domain end which can 
all simultaneously touch a single molecule of the 
target material, and obtaining potential binding 
by substituting a different amino acid for 
more of the amino acids in said interaction 



The method of * claim i wherein the target materiel 
is a non~macromoieeuiar organic compound and the 
potential binding domains comprise greater than 
about 80 amino acid residues , 

The method of claim 1 wherein the target material 
is a hon-macromoleetilar organic compound and the 
potential binding domains comprise greater than 
80 



10, The method of claim l wherein the target material 
is a mineral insoluble in aqueous solution, 

30 

11 x The method of claim 1 wherein the target, is an 
inorganic molecule or complex ion that is stable 
in aqueous solution* 

35 12, The method of claim 1 wherein the target is an 
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organosietallic compound that is stable in aqueous 
solution. 

The. method of claim 1 wherein -the target material 
is a general protease., wherein the immobilised 
target material is first incubated with an 
irreversible or eovaient inhibitor to inactivate 
the protease. 

The method of claim l wherein the repiieafale 
genetic package is a cell or virus that can be 
affinity separated ana retain viability. 

The method of claim 5 wherein the known binding 
protein is an easyme, the activity of which has a 
deleterious effect on the replicable genetic 
package f the host of the replicable genetic 
package, or the target , wherein the majority of 
the nucleic acid constructs code on expression or 
an analogue of the known binding protein that does 
not have such deleterious enzymatic activity. 

The method of claim 1 wherein the target contains 
ienisahla groups and the pH of the solutions of 
the intended use and the pH of the affinity 
separations are chosen so that both the potential 
binding protein and the target remain stable. 

The method of claim i wherein the target contains 
ionxssable groops, further comprising providing 
counter ions to reduce electrostatic repulsion 
between the potential binding protein and the 
target, 
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1.8, The method of claim- 1 wherein the initial 
potential binding domain is picked so that, under 
th© conditions of intended use of the desired 
binding protein and under the conditions of 
affinity separation, that the potential binding 
domains and the target will either have opposite 
charge or one of the® will be neutral. 

IS. The method of claim as wherein the replicahla 
genetic package is a. bacterial cell, such as 
a strain of Escherichia soli, 

20. The method of claim X wherein the repiicabla 
genetic package is a bacterial spore such as 
a .- Bacillus, en&ospore, more preferably an endospore 
of a strain of subtills, 

ax. The method of claim 1 wherein the repiicable 
genetic package is a bacteriophage, such as a 
filamentous phage, preferably a derivative of an 
HX3 Escherichia coll bacteriophage or derivative 

pfi. 



25 V 22. The method of claim 21 wherein the signal is 
provided by the coat protein of HI 3 or a segment 
thereof embodying an outer surface transport 
signal, 

3D 23, The Bathed of claim 21 wherein the signal is 
provided by the gene III protein of HI 3 or a 
segment thereof embodying an outer surface 
transport signal. 
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method o€ claim. 2 wherein the distribution of 
nucleotides incorporated at each variegated cession 
is chosen to yield substantially squ&l abundances 
of acidic and basic amino acids,. and, preferably 
the distribution of nucleotides incorporated at 
each variegated co&os is further chosen to yield 
the largest value for the quantity {(X,~ 
abundartce(stop eodons)} times (abundance of the 
least abundant amino acid}/ (abundance of the Most 
abundant amino acid) } . 

The method of claim 1, wherein step (c) further 
comprises contacting the packages with a second 
material and isolating packages which do not bind 
that second material. 

The method of claim l f wherein after obtaining a 
novel binding protein recognizing a first 
predetermined target, the novel binding protein is 
chosen as a parental potential binding protein for 
the -isolation of a derivative protein which also 
binds to a second predetermined target » 

The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected from 
the group consisting of (a) binding domains of 
bovine pancreatic trypsin inhibitor f crambin, 
ovomucoid, T4 lyso&yme, hen egg white lysosyme,, 
ribonuclease, and aaurin, and (b) domains at least 
50% homologous with any of the foregoing domains 
and which have a melting point of at least 80°C, 
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38* The method of claim 36 wherein the outer surface, 
transport signal is provided toy the lamB protein 
or a segment thereof embodying an outer surface 
transport signal* 

29. The method of claim 38 wherein the outer surface 
transport signal is provided toy the cotA, eotB, 
cote or ootD protein or a segment thereof 



& chimeric protein comprising (i) at least a 
segment of an outer surface protein of a cell or 
virus, said segment providing an outer surface 
transport signal recognized by said cell or virus, 
and £ii) a domain foreign to said outer surface 
protein, and, preferably, said foreign domain 
binds to a target material not preferentially 
bound by said outer surface protein, 

A replicabie genetic package which contains a 
nucleic acid construct which codes on expression 
for tbe chimeric protein of claim 30, 

The method of claim 1 wherein in at least one 
instance the amino acid residues varied in a first 
assortment of potential binding domains are left 
constant in the next assortment of potential 
binding domains. 



30 33, A method of preparing a population of variegated 
DMA wherein the distribution of nucleotides 
incorporated at each variegated eodon is chosen to 
yield substantially equal abundances of acidic and 
basic amino acids, and, preferably, the 

3S distribution of nucleotides incorporated at each 
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variegated codon is further chosen to yield the 
largest value for the quantity ( (l . -abundan.es (stop 
codons}) times (atouadasica of the least abundant 
amino acid}/ (abundance of the most abundant amino 
*?&». 

The protein of claim 66 f wherein the protein 
comprises a first foreign domain recognising a 
first target material and a second foreign domain 
recognising a second target material. 

The method of claim 3 therein the initially chosen 
parental potential binding domain is at least 50% 
homologous with the binding domain of bovine 
pancreatic trypsin inhibitor. 
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Group I- Claims 1-32.. 34 and 35, drawn to a method of producing a 
bindino crotsin. and a Drot=i« classified in Class 435. subclass 
S3 and class S30 subclass 38?, 

distinct method species ccmpr ising those set forth, in: 

n) claims 3 and 5, wherein the potential binding domains are 

muteins cf the parental bindxng domains' 

b) claim 27 . wherein the potential binding dona i as are luteins of 
bind! n > ".ncrsstic tryps 

c) claim 27, wherein the potential binding domains are mute ins of 
binding domains of crambin: 

d) claim 27. wherein the potential bit 1 * « of 
ovomucoid ? 

a) claim 27, wherein the potential binding domain is a muteln cf 
T4 Iv-soaysse? 

f } claim 27, wherein the potential binding domain is a cut-in of 
hen ego white lysosyma; 

g) claim 27, wherein the potential binding domain is a mute in of 
r ibo.au c lease ; 

hi claim 27 . wherein the potential binding domain ■%« a met sin of 
azurin; 

i) claim 4, wherein the potential binding protein is a chimeric 
protein' 

j) claim 8, wherein -the target material's potential binding 
*ain comprises less than SO amino acids: 
claim 9, wherein see tarcot r * - 
domains comprise greater than 8 0 amine acids; 

claim 10, wherein the target material is a mineral insoluble 
aqueous solution ? 
claim .11, wherein the taroe" ,s ■3- or 1 " 
at is stable in aqueous solution; and 
claim 13.. wherein the target material is an inactivated 
protease „ 

II. Claim 33,. drawn to a method of preparing DhA based on a 
mathematical formula, classified in Class 435, subclass 172,3* 

1 ^ 5 * x at 

invention concept reflected in Rule 13.. 2, 

The process as claimed can be used to make other and 
materially different products as evich i i species 

in Group l. Also the product as claimed can be made by another 
and materially, different process such as chemical nectids 
synthesis . 

Ho required additional search fees were timely paid bp the 
applicant. The international sesr-fr. report is restricted '. . "he 
invention first mention in the claims, namely Generic claims 1 
and 2 to the extent they read on species la. 



